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rH (54) Title: DIRECTED EVOLUTION OF OXIDASE ENZYMES 
00 



^ (57) Abstract: This invention relates to the expression of improved polynucleotide and polypeptide sequences encoding for eukary- 
otic enzymes, particularly oxidase enzymes. The enzymes are advantageously produced in conventional or facile expression systems. 
Various methods for directed evolution of polynucleotide sequences can be used to obtain the improved sequences. The improved 
O characteristics of the polypeptides or proteins generated in this manner include improved expression, enhanced activity toward one 
or more substrates, and increased thermal stability. In a particular embodiment, the invention relates to improved expression of the 
galactose oxidase gene and galactose oxidase enzymes. GAO mutants that are highly active and/or thermostable are disclosed. 
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Directed Evolution of Oxidase Enzymes 

This invention is concerned with the production of modified enzymes, particularly 
oxidase enzymes, and more particularly galactose oxidase enzymes. Recombinant techniques 
such as directed evolution are used to obtain polynucleotide and polypeptide products having 
desirable properties. Galactose oxidase variants with increased activity and increased 
5 thermostability relative to the wild-type enzyme are described. 

BACKGROUND OF THE INVENTION 

An "oxidation enzyme" is an enzyme that catalyzes one or more oxidation reactions, 
typically by adding, inserting, contributing or transferring oxygen from a source or donor to 
a substrate. Such enzymes are also called oxidoreductases or redox enzymes, and 
) encompasses oxygenases, hydrogenases or reductases, oxidases and peroxidases. One such 

enzyme is galactose oxidase. This invention relates to the selection and production of 
polynucleotides that encode polypeptides or proteins with biological activity as oxidation 
enzymes, and in particular galactose oxidase enzymes. These enzymes are produced in facile 
expression systems such as robust prokaryotic cells (e.g. bacteria) and eukaryotic systems (e.g. 

> fungi and yeast). 

FIELD OF THE INVENTION 

The invention concerns the recombinant production of functional eukaryotic proteins 
by host cells, in high yield, with increased activity, and/or with increased stability, e.g. 
thermostability. Preferred proteins of the invention include oxidase enzymes (oxidases) such 

> as polypeptides evolved from galactose oxidase (D-galactoseioxygen 6-oxidoreductase or 
GAO; EC 1 . 1 .3 .9). Polynucleotides which encode and express these; proteins in recombinant 
host cell expression systems, and the resulting polypeptides, are encompassed by the invention. 
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The publications and reference materials noted herein and listed in the appended 
Bibliography are each incorporated by reference in their entirety. They are referenced 
numerically in the text and the Bibliography below. 

Production of Enzyme Variants 

Many proteins of interest are produced by organisms having "eukaryotic" cells. These 
are cells having a nucleus surrounded by its own membrane and containing DNA on 
structures called chromosomes. All multicellular organisms, such as humans and animals, and 
many single-cell animals, have eukaryotic cells. Other single-cell organisms, such as bacteria 
have "prokaryotic" cells. These cells have a primitive nucleus with DNA in a defined 
structure, but without chromosomes and a nuclear membrane that is characteristic of 
eukaryotes. Prokaryotic organisms are generally much easier and less costly to grow, maintain 
and manipulate than eukaryotic cells. 

Genetic engineering and recombinant DNA and RNA technologies have made it 
possible to produce proteins, hormones and enzymes that are native to one organism, by using 
the cells of a different organism as "factories" or host cell expression systems. In particular, 
it is often desirable to express a protein of eukaryotic origin in a prokaryotic host cell, because 
the prokaryotes can be grown in large quantities of identical cells, to produce large amounts 
of the desired foreign protein. For example, certain human proteins may be useful as drugs 
if they can be supplied in sufficient quantity to patients who have a protein deficiency. Such 
proteins may not easily or ethically be obtained by isolating them from human cells, nor can 
they easily be made by direct chemical synthesis or by growing them in isolated tissue 
cultures. Other proteins and enzymes are useful in industry. For example, certain enzymes 
can break down food products, and are useful in laundry detergent. However, commercial 
applications require large amounts of protein and a high degree of quality control. Desirable 
applications also require or would benefit from more active or more thermostable (heat 
resistant) proteins or enzymes. 

To solve some of these problems, recombinant genetic engineering techniques have 
been developed to use genetic machinery of other cells, such as bacteria and yeast, to produce 
human or other proteins. Selected genetic material, such as a polynucleotide that encodes a 
desired protein, is "recombined" with genetic material in a host cell, so that the host cell 
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expresses the introduced foreign genetic material and produces the desired polypeptide or 
protein. Bacteria, fungi and and yeast can be suitable host cells because they are easy and 
economical to grow and maintain in large quantities, and can be used to reliably and 
repeatably produce foreign proteins. Some proteins that are made by cells can be secreted or 
delivered outside the cell, which can improve the yield and the efficiency of subsequent 
isolation and purification steps. 

Directed evolution has been successfully applied to improve a variety of enzyme 
properties, such as substrate specificity, activity in organic solvents, and stability at high 
temperatures, which are often critical for industrial applications (5). This evolutionary 
approach uses DNA shuffling, for simultaneous random mutagenesis and recombination, to 
generate a variant having an improved desirable property over the existing wild type protein. 
Point mutations are generated due to the intrinsic infidelity of Taq-based polymerase chain 
reactions (PCR) associated with reassembly of nucleic acid sequences. In one example,* 
Stemmer and coworkers applied this technique to the gene encoding for green fluorescence 
protein (GFP), which resulted in a protein that folded better than the wild type in E coli (10). 
Other examples are in the literature. (1 1-18, 21-25, 27-34, 47-58, 60-63, 65-75). Eukaryotic 
enzymes have a myriad of existing and potential applications, but improvement of these and 
other proteins by directed evolution is desirable. For example, the difficulty of expressing 
certain oxidase enzymes in a facile expression host has posed technical challenges. Efforts 
to modify these enzymes for industrial applications by protein engineering methods have been 
impeded. Directed evolution, for example, exploits expression in a host such as E coli or S. 
cerevisiae, organisms in which large libraries of mutants or variants can be made. Also, the 
lack of efficient expression in an appropriate foreign (heterologous) host can prevent the mass 
production of some of these proteins on an economical scale. Thus, there continues to be a 
need for new ways to produce new proteins, and for new proteins and enzymes having new 
or enhanced biological properties. 

Galactose Oxidase Enzvmes 

One protein of interest is the oxidation enzyme galactose oxidase. Galactose oxidase 
(D-galactose: oxygen 6-oxidoreductase, GAO; EC 1.1.3.9) is an enzyme containing a single 
copper ion, and is secreted by a number of fungal species. Fusarium NRRL 2903, formerly 
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known as Dactylium dendroides, has been the most extensively studied (76). The enzyme is 
a glycoprotein with a carbohydrate content of about 1 .7% and consists of a single polypeptide 
chain of 639 amino acid residues with molecular mass of 68,000 Da (77, 78). The reaction 
catalyzed by GAO is the oxidation of primary alcohols to the corresponding aldehydes, 
5 coupled to the two-electron reduction of 0 2 to hydrogen peroxide (79). 

The enzyme oxidizes an unusually broad range of substrates. It accepts D-galactose 
(FIG. 1), alpha- and beta-galactopyranosides, oligo- and polysaccharides and considerably 
smaller molecules, such as glycerol and allyl alcohol, as substrates (77, 80-82). GAO exhibits 
prochiral (only the pro-S hydrogen is abstracted) as well as enantiomeric specificity for 

1 0 galactose (only D-galactose is oxidized by the enzyme) (80, 83). Furthermore, GAO strictly 

discriminates against D-glucose, the C-4 epimer of D-galactose, as a substrate or ligand. D- 
glucose does not bind to GAO at concentrations as high as 1 M (80, 84). The kinetic 
parameters of GAO for the oxidation of galactose are: K m = 67 mM, k C at 83 3*000 sec" 1 ; 
kcat/Km = 45xl0 3 M'W* (85). 

15 The crystal structure of GAO has been reported (86). It consists of three 

predominantly beta-structure domains. The copper ion lies on the solvent-accessible surface 
of the second and largest domain (residues 156-532) (78, 87). Tyr-272, Tyr-495, His-496, 
His-581 and a water molecule are the copper ligands at pH 7.0. The crystal structure also 
reveals a novel thioether bond linking Cys-228 and Tyr-272 and supports the presence of a 

20 tyrosine free radical at the active site (79). The active site structure of GAO is shown in FIG. 

2. Site-directed mutagenesis of Tyr-495 and Cys-228 have confirmed their involvement in 
catalysis (85, 88). 

GAO is useful in a wide variety of applications, ranging from analytical and food 
chemistry to chemoenzymatic synthesis and clinical testing. For example, biological sensors 

25 based on GAO have been developed to determine the content of galactose (89), lactose and 

other GAO substrates (90). Such biosensors have also been used for quality control in dairy 
industries (91, 92), online bioprocess monitoring (93) and analysis of blood samples of 
patients with suspected galactosemia (94). The stereospecificity and broad substrate 
specificity of GAO have been exploited in the chemoenzymatic synthesis of L-sugars from 

30 polyols (95), which are usually difficult to prepare by chemical methods (96, 97), as well as 

sugar-containing polyamines (98) and 5-C-(hydroxymethyl)hexoses (99). GAO applications 
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in synthesis have been limited due to its relatively low activity toward a large number of 
primary alcohols (100). Additionally, GAO is also used for the detection of the disaccharide 
D-galactose-beta-(l->3)-N-acetylgalactosamine (Gal-GalNAc), a tumor marker in colonic 
cancer and precancer, and provides a cost-effective screening test for patients with neoplasia 
or at the risk of developing neoplasia (101, 102). GAO finds applications in food 
chemistry. For example, it has been used in oxidized guar manufacture (1 03) and to treat the 
oligosaccharide fraction contained in honey (104). Finally, GAO is used to oxidize the cell 
surface polysaccharides of membrane-bound glycoproteins containing terminal non-reducing 
galactose residues: this is an essential step in the successful radiolabeling of these 
glycoconjugates (105, 106). 

Modified and particularly improved or optimized GAO enzymes are useful to improve 
and expand the use of the enzyme in practical applications. For example, enzymes of the 
invention include GAO variants that are more active, more thermostable, or both. Increased 
activity and/or expression as well as high thermostability may significantly decrease the cost 
of enzyme production, simplify its purification and handling, and prolong its shelf-life. Other 
properties of the enzyme may also be varied, for example to optimize activity towards 
particular substrates or toward other substrates such as polymeric materials and glucTose. 

Use of these evolved enzymes in biosensors and diagnostics can increase sensitivity, 
decrease the response time and enhance the detection range. In addition, a more stable 
enzyme will find applications in the construction of biosensors with prolonged stability. An 
evolved GAO with improved activity toward poor GAO substrates, such as allyl alcohol and 
glucose, will provide new and improved applications of the enzyme in organic synthesis and 
other sensor applications. For chemical synthesis applications, selective oxidation of alcohols 
to the corresponding aldehydes avoids the use of protecting groups, minimizes side reactions 
often observed in traditional chemical synthesis, and is an environmentally friendly process. 
Use of such GAO enzymes as a synthetic reagent would facilitate the use of more inexpensive, 
safe and biodegradable carbohydrate materials in industrial processes (107). 

A more efficient enzyme is expected to be advantageous in the food chemistry 
applications of GAO, and, in particular in the selective modification of guar and other 
carbohydrate-based polymers. GAO variants according to the invention would also be useful 
for modification of carbohydrate-based (e.g. cellulosic) textiles and other materials. The 
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aldehyde function produced by the GAO can be used to couple other substances selectively 
at the modified position on the polymer. 

Accordingly, there is a need to develop new and improved GAO enzymes, as well as 
methods for expressing such proteins. In particular, there is a need for protein expression 
methods which are well-suited for use in connection with directed evolution techniques. 

This invention describes methods for screening libraries of GAO mutants produced 
by error-prone PCR and DNA shuffling, to identify mutations that are expressed in bacteria 
(e.g. E. coli) and with improved GAO function. Micro-plate and membrane screening 
techniques are disclosed. In one embodiment, the mutant is a functional and active galactose 
oxidase (GAO) that is expressed iaE coli at levels of about 65 times the activity of a parent 
recombinant wild type (for D-galactose). The activity for other substrates, such as allyl 
alcohol, is also about 65 times that of wild type. Mutants of the invention can have any 
fraction or multiple of the corresponding wild type activity, but preferably are more active; 
e.g. about 2 to 200 times as active. Mutants also are more thermostable. Enzyme yield is 
generally at least about 10 mg/1. 

SUMMARY OF THE INVENTION 

The observed constraints on the use of native proteins are thought to be a 
consequence of evolution. Proteins have evolved in the context and environment of a 
living organism, to carry out specific biological functions under conditions conducive to 
life - not in the laboratory or under industrial conditions. In some cases, evolution may 
favor or even require less than optimally efficient enzymes. The output, efficiency, 
working conditions, stability and other properties of known expression systems are not 
thought to be unalterable, nor are they limitations which should be seen as intrinsic to the 
nature of cellular expression systems. It is possible that the proteins used in these systems 
can be evolved in vitro, or that analogous proteins can be otherwise developed, to alter or 
enhance the protein's properties, for example, to obtain much more efficient expression, 
activity and thermostability. Improved proteins can also be obtained by screening cultures 
of native organisms or expressed gene libraries (3). 

The invention provides a method for improving the expression, thermostability, 
and/or the activity toward one or more substrates, of a polynucleotide encoding oxidase 
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enzymes by using directed evolution. The invention also provides polynucleotides 
encoding for variant oxidase enzymes which have improved properties in conventional 
expression systems. According to one embodiment of the invention, directed evolution or 
random mutagenesis is used to produce GAO variants which are more highly expressed, 
more active, and/or more thermostable in prokaryotic expression systems such as E. coli. 

The above features and many other attendant advantages of the invention will 
become better understood by reference to the following detailed description when taken in 
conjunction with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 shows a reaction scheme in which a D-galactose substrate is oxidized to 
produce a D-galactohexodialdose product, in the presence of galactose oxidase (GAO) 
enzyme. 

FIG. 2 shows the active site structure of GAO pH 7.0 

FIG. 3 is a graph showing the effect of metal ions (particularly copper ions) on the 
activity of a recombinant wild-type GAO, pGAO-0 1 0. Enzyme solutions with additives 
were kept at 4 °C for 1 hr before assay. Relative activity of enzyme solution with 1 mM 
copper (II) sulfate was estimated as 100 %. 

FIG. 4 is a graph showing GAO activity for various clones generated by error- 
prone PCR, with varying concentrations of MnCl 2 > using conditions A of TABLE 3. 

FIG. 5 is a graph showing GAO activity for various clones generated by error- 
prone PCR, with varying concentrations of MnCl^ using conditions C of TABLE 3. 

FIG. 6 shows the sequences of PCR primers used herein for amplification, e.g. of . 
the whole galactose oxidase gene. 

FIG. 7 is a schematic representation of the construction of plasmid pUCl 8-EHL. 

FIG. 8 is a schematic representation of the construction of plasmid pGAO-010. 

FIG. 9 is a schematic representation of the construction of plasmids pGAO-027 
andpGAO-036. 

FIG. 10 is a schematic representation of the construction of plasmids pGAO-006 . 
andpGAO-011. 
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FIG. 11 shows the structures and activities of representative plasmids encoding 
GAO according to the invention, with DPTG-induced expression in host E. colt Permeable 
ceUs which were treated by freeze (-20 °C), thaw (4 °C) and 0.5 mg/1 lysozyme for 30 
minutes at 37 °C were used for assay. Activities given as * indicates that cells did not 
grow in test tube culture; **indicates that a transformant was not obtained. 

FIG. 12 shows a scheme for the design of plasmids according to the invention. 

FIG, 13 shows the structures and activities of additional plasmids encoding GAO 
according to the invention, with IPTG-induced expression in host £ coll 

FIG, 14 is a graph comparing the GAO activities of GAO plasmids with and 
without random codon alternation. 

FIG. 15 shows substrate specificities for a wild type galactose oxidase and a 
recombinant galactose oxidase enzyme of the invention. Partially purified galactose 
oxidase from D. dendroides (Sigma) and cell-free extract from E. coli BL21(DE3)/pGAO- " 
010 were used. Relative activities for D-galactose were estimated as 100 %. (+) indicates 
that oxidation was detected, but activities were too low to be estimated. n.d. indicates that 
activities were not distinguishable from background absorbance levels. 

FIG. 16 is a graph showing the thermal stability of selected GAO mutants. 

FIGS. 17A-C show the sequence of representative mutant 9. 1 6.8D2 of the 
invention [SEQ. ID NO. 10] 

FIGS. 18A-C show the sequence of representative mutant 9.16.6C1 1 of the 
invention [SEQ. ID NO. 11] 

FIGS. 19A-C show the sequence of representative mutant 9.16.16D12 of the 
invention [SEQ. ID NO. 12] 

FIGS. 20A-C show the sequence of representative mutant 11.03.6D3 of the 
invention [SEQ. ID NO. 13] 

. FIGS. 21A-C show the sequence of representative mutant 11.03.10C3 of the 
invention [SEQ. ID NO. 14] 

FIGS. 22A-C show the sequence of representative mutant 1 1.03.10D6 of the 
invention [SEQ. ID NO. 15] 

FIGS. 23A-C show the sequence of representative mutant 11.03.13E12 of the 
invention [SEQ. ID NO. 16] 
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PIGS. 24A-C show the sequence of representative mutant 1.06.20E7 of the 
invention [SEQ. IB NO. 17] 

FIGS. 25A-C show the sequence of representative mutant 1 .D4 of the invention 
[SEQ. ID NO. 18] 

FIGS. 26A-C show the sequence of representative mutant 2G4 of the invention 
[SEQ. ID NO. 19] 

FIGS. 27A-C show the sequence of representative mutant 3.H7 of the invention 
[SEQ. ID NO. 20] 

FIGS. 28A-C show the sequence of representative mutant 4.F12 of the invention 
[SEQ. ID NO. 21] 

DETAILED DESCRIPTION OF THE INVENTION 

This invention concerns methods for improving the expression, activity and/or 
thermostability of proteins using facile or conventional expression systems. 

Definitions 

As used herein, "about" or "approximately 39 shall mean within 20 percent, 
preferably within 10 percent, and more preferably within 5 percent of a given value or 
range. 

The term "substrate" means any substance or compound that is converted or meant 
to be converted into another compound by the action of an enzyme catalyst The term 
includes aromatic and aliphatic compounds, and includes not only a single compound, but 
also combinations of compounds, such as solutions, mixtures and other materials which 
contain at least one substrate. 

An "oxidation reaction" or "oxygenation reaction", as used herein, is a chemical or 
biochemical reaction involving the addition of oxygen to a substrate, to form an 
oxygenated or oxidized substrate or product. An oxidation reaction is typically 
accompanied by a reduction reaction (hence the term "redox" reaction, for oxidation and 
reduction). A compound is "oxidized" when it receives oxygen or loses electrons. A 
compound is "reduced" when it loses oxygen or gains electrons. GAO typically catalyzes 
the oxidation of a primary alcohol group to an aldehyde. 



WO 01/88110 



PCT/US00/32345 



-10- 

The term "enzyme" means any substance composed wholly or largely of protein or 
polypeptides that catalyzes or promotes, more or less specifically, one or more chemical or 
biochemical reactions. 

A "polypeptide" (one or more peptides) is a chain of chemical building blocks 

5 called amino acids that are linked together by chemical bonds called peptide bonds. A 

protein or polypeptide, including an enzyme, may be "native" or "wild-type", meaning that 
it occurs in nature or has the amino acid sequence of a native protein, respectively. These 
terms are sometimes used interchangeably. A polypeptide may or may not be 
glycosylated. A "recombinant wild-type" typically means the wild type sequence in a 

10 recombinant host without glycosylation. Comparisons in the examples and figures of this 

application are generally with reference to a wild type that is a recombinant wild type. A 
polypeptide may also be a "mutant" , "varianf 9 or "modified", meaning that it has been 
made, altered, derived, or is in some way different or changed from a native protein, or 
from another mutant. A native wild type protein comprises the natural sequence of amino 

1 5 acids in the polypeptide and typically includes glycosylation. A "parent" polypeptide or 

enzyme is any polypeptide or enzyme from which any other polypeptide or enzyme is 
derived or made, using any methods, tools or techniques, and whether or not the parent is 
itself a native or mutant polypeptide or enzyme. A parent polynucleotide is one that 
encodes a parent polypeptide. A "test enzyme" is a protein-containing substance that is 

20 tested to determine whether it has properties of an enzyme. The term "enzyme" can also 

refer to a catalytic polynucleotide (e.g. RNA or DNA). 

The "activity" of an enzyme is a measure of its ability to catalyze a reaction, and 
may be expressed as the rate at which the product of the reaction is produced. For 
example, enzyme activity can be represented as the amount of product produced per unit of 

25 time, per unit (e.g. concentration or weight) of enzyme. The "stability" of an enzyme 

means its ability to function, over time, in a particular environment or under particular 
conditions. One way to evaluate stability is to assess its ability to resist a loss of activity 
over time, under given conditions. Enzyme stability can also be evaluated in other ways, 
for example, by determining the relative degree to which the enzyme is in a folded or 

30 unfolded state. Thus, one enzyme is more stable than another, or has improved stability, 

when it is more resistant than the other enzyme to a loss of activity under the same 
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conditions, is more resistant to unfolding, or is more durable by any suitable measure. For 
example, a more "thermally stable" or "thermostable" enzyme is one that is more resistant 
to loss of structure (unfolding) or function (enzyme activity) when exposed to heat or an 
elevated temperature. One way to evaluate this is to determine the "melting temperature" 
5 or T m for the protein. The melting temperature, also called a midpoint, is the temperature 

at which half of the protein is unfolded from its fully folded state. This midpoint is 
typically determined by calculating the midpoint of a titration curve that plots protein 
unfolding as a function of temperature. Thus, a protein with a higher T m requires more 
heat to cause unfolding and is more stable or more thermostable. Stated another way, a 

1 0 protein with a higher T ro indicates that fewer molecules of that protein are unfolded at the 

same temperature as a protein with a lower T m , again meaning that the protein which is 
more resistant to unfolding is more stable (it has less unfolding at the same temperature). 
Another measure of stability is l m or T 50 , which is the transition midpoint of the 
inactivation curve of the protein as a function of temperature. T m is the temperature at 

1 5 which the protein loses half of its activity. Thus, a protein with a higher T in requires more 

heat to deactivate it, and is more stable or more thermostable. Stated another way, a 
protein with a higher T 1/2 indicates that fewer molecules of that protein are inactive at the 
same temperature as a protein with a lower T 1/2 , again meaning that the protein which is 
more resistant to deactivation is more stable (it has more activity at the same temperature). 

20 These assays are also called "thermal shift" assays, because the inactivation or unfolding 

curve, plotted against temperature, is "shifted" to higher or lower temperatures when 
stability increases or decreases. Thermostability can also be measured in other ways. For 
example, a longer half-life (t 1/2 ) for the enzyme's activity at elevated temperature is an 
indication of thermostability. 

25 An "oxidation enzyme" is an enzyme that catalyzes one or more oxidation 

reactions, typically by adding, inserting, contributing or transferring oxygen from a source 
or donor to a substrate. Such enzymes are also called oxidoreductases or redox enzymes, 
and encompasses oxygenases, hydrogenases or reductases, oxidases and peroxidases. 
The terms "oxygen donor", "oxidizing agent" and "oxidant" mean a substance, 

30 molecule or compound which donates oxygen to a substrate in an oxidation reaction. 

Typically, the oxygen donor is reduced (accepts electrons). Exemplary oxygen donors, 
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which are not limiting, include molecular oxygen or dioxygen (O2) and peroxides, 
including alkyl peroxides such as t-butyl peroxide, and most preferably hydrogen peroxide 
(H 2 0 2 ). A peroxide is any compound having two oxygen atoms bound to each other. 

A "luminescent" substance means any substance which produces detectable 
electromagnetic radiation, or a change in electromagnetic radiation, most notably visible 
light, by any mechanism, including color change, UV absorbance, fluorescence and 
phosphorescence. Preferably, a luminescent substance according to the invention produces 
a detectable color, fluorescence or UV absorbance. The term "chemiluminescent agenf * 
means any luminescent substance which enhances the detectability of a luminescent (e.g., 
fluorescent) signal, for example by increasing the strength or lifetime of the signal. One 
exemplary and preferred chemiluminescent agent is azinobis(3-ethylbenzotMazoline-6- 
sulfonic acid) (ABTS). 5-amino-2,3-dihydro-l,4-phthalazinedione (luminol) and analogs. 
Others include 5-ainino-2,3-dihydro-l,4-phthalazinedione (luminol) and analogs, 1,2- 
dioxetanes such as tetramethyl-l,2-dioxetane (TMD), 1,2-dioxetanones, and 1,2- 
dioxetanediones, 0-anisidine, o-dianisidine, and 0-tolidine. Another term for these kinds 
of materials is "chromogen." 

The term "polymer" means any substance or compound that is composed of two or 
more building blocks ('mers') that are repetitively linked to each other. For example, a 
"dimer" is a compound in which two building blocks have been joined together. 

The term "cofactor" means any non-protein substance that is necessary or 
beneficial to the activity of an enzyme. A "coenzyme" means a cofactor that interacts 
directly with and serves to promote a reaction catalyzed by an enzyme. Many coenzymes 
serve as carriers. For example, NAD + and NADP + cany hydrogen atoms from one enzyme 
to another. An "ancillary protein" means any protein substance that is necessary or 
beneficial to the activity of an enzyme. 

The term "host cell" means any cell of any organism that is selected, modified, 
transformed, grown, or used or manipulated in any way, for the production of a substance 
by the cell, for example the expression by the cell of a gene, a DNA or RNA sequence, a 
protein or an enzyme. 

"DNA" (deoxyribonucleic acid) means any chain or sequence of the chemical 
building blocks adenine (A), guanine (G), cytosine (C) and thymine (T), called nucleotide 
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bases, that are linked together on a deoxyribose sugar backbone. DNA can have one 
strand of nucleotide bases, or two complimentary strands which may form a double helix 
structure. "KNA" (ribonucleic acid) means any chain or sequence of the chemical building 
blocks adenine (A), guanine (G), cytosine (C) and uracil (U), called nucleotide bases, that 
are linked together on a ribose sugar backbone. RNA typically has one strand of 
nucleotide bases. 

A "polynucleotide" or "nucleotide sequence" is a series of nucleotide bases (also 
called "nucleotides") in DNA and RNA, and means any chain of two or more nucleotides. 
A nucleotide sequence typically carries genetic information, including the information 
used by cellular machinery to make proteins and enzymes. These terms include double or 
single stranded genomic and cDNA, RNA, any synthetic and genetically manipulated 
polynucleotide, and both sense and anti-sense polynucleotide (although only sense stands 
are being represented herein). This includes single- and double-stranded molecules, i.e. 9 
DNA-DNA, DNA-RNA and RNA-RNA hybrids, as well as "protein nucleic acids" (PNA) 
formed by conjugating bases to an amino acid backbone. This also includes nucleic acids 
containing modified bases, for example thio-uracil, thio-guanine and fluoro-uracil. 

The polynucleotides herein may be flanked by natural regulatory sequences, or may 
be associated with heterologous sequences, including promoters, enhancers, response 
elements, signal sequences, polyadenylation sequences, introns, 5'- and 3'- non-coding 
regions, and the like. The nucleic acids may also be modified by many means known in 
the art. Non-limiting examples of such modifications include methylation, "caps", 
substitution of one or more of the naturally occurring nucleotides with an analog, and 
internucleotide modifications such as, for example, those with uncharged linkages (e.g., 
methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates, etc.) and with 
charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). Polynucleotides 
may contain one or more additional covalently linked moieties, such as, for example, 
proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), 
intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, 
oxidative metals, etc.), and alkylates. The polynucleotides may be derivatized by 
formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. 
Furthermore, the polynucleotides herein may also be modified with a label capable of 
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providing a detectable signal, either directly or indirectly. Exemplary labels include 
radioisotopes, fluorescent molecules, biotin, and the like. 

Proteins and enzymes are made in the host cell using instructions in DNA and 
RNA, according to the genetic code. Generally, a DNA sequence having instructions for a 
particular protein or enzyme is "transcribed" into a corresponding sequence of RNA. The 
RNA sequence in turn is "translated" into the sequence of amino acids which form the 
protein or enzyme. An "amino acid sequence" is any chain of two or more amino acids. 
Each amino acid is represented in DNA or RNA by one or more triplets of nucleotides. 
Each triplet forms a codon, corresponding to an amino acid. For example, the amino acid 
lysine (Lys) can be coded by the nucleotide triplet or codon AAA or by the codon AAG. 
(The genetic code has some redundancy, also called degeneracy, meaning that most amino 
acids have more than one corresponding codon.) Because the nucleotides in DNA and 
RNA sequences are read in groups of three for protein production, it is important to begin 
reading the sequence at the correct amino acid, so that the correct triplets are read. The 
way that a nucleotide sequence is grouped into codons is called the "reading frame." 

The term "gene", also called a "structural gene" means a DNA sequence that codes 
for ,or corresponds to a particular sequence of amino acids which comprise all or part of 
one or more proteins or enzymes, and may or may not include regulatory DNA sequences, 
such as promoter sequences, which determine for example the conditions under which the 
gene is expressed. Some genes, which are not structural genes, may be transcribed from 
DNA to RNA, but are not translated into an amino acid sequence. Other genes may 
function as regulators of structural genes or as regulators of DNA transcription. 

A "coding sequence" or a sequence "encoding 11 a polypeptide, protein or enzyme is 
a nucleotide sequence that, when expressed, results in the production of that polypeptide, 
protein or enzyme, i.e., the nucleotide sequence encodes an amino acid sequence for that 
polypeptide, protein or enzyme. A coding sequence is "under the control" of 
transcriptional and translation^ control sequences in a cell when RNA polymerase 
transcribes the coding sequence into mRNA, which is then trans-RNA spliced and 
translated into the protein encoded by the coding sequence. Preferably, the coding 
sequence is a double-stranded DNA sequence which is transcribed and translated into a 
polypeptide in a cell in vitro or in vivo when placed under the control of appropriate 



WO 01/88110 



PCTAJS00/32345 



-15- 

regulatory sequences. The boundaries of the coding sequence are determined by a start 
codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxyl) terminus. 
A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from 
eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, 
and even synthetic DNA sequences. If the coding sequence is intended for expression in a 
eukaryotic cell, a polyadenylation signal and transcription termination sequence will 
usually be located 3 f to the coding sequence. 

Transcriptional and translational control sequences are DNA regulatory sequences, 
such as promoters, enhancers, terminators, and the like, that provide for the expression of a 
coding sequence in a host cell. In eukaryotic cells, polyadenylation signals are control 
sequences. 

A "promoter sequence" is a DNA regulatory region capable of binding RNA 
polymerase in a cell and initiating transcription of a downstream (3 1 direction) coding 
sequence. For purposes of defining this invention, the promoter sequence is bounded at its 
3' terminus by the transcription initiation site and extends upstream (5* direction) to 
include the minimum number of bases or elements necessary to initiate transcription at 
levels detectable above background. Within the promoter sequence will be found a 
transcription initiation site (conveniently defined for example, by mapping with nuclease 
SI), as well as protein binding domains (consensus sequences) responsible for the binding 
of RNA polymerase. As described above, promoter DNA is a DNA sequence which 
initiates, regulates, or otherwise mediates or controls the expression of the coding DNA. 
A promoter may be "inducible", meaning that it is influenced by the presence or amount of 
another compound (an "inducer"). For example, an inducible promoter includes those 
which initiate or increase the expression of a downstream coding sequence in the presence 
of a particular inducer compound. A "leaky" inducible promoter is a promoter that 
provides a high expression level in the presence of an inducer compound and a 
comparatively very low expression level, and at minimum a detectable expression level, in 
the absence of the inducer. 

A "signal sequence" is included at the beginning of the coding sequence of a 
protein to be expressed in the periplasmic space, or outside the cell. This sequence 
encodes a signal peptide, N-terminal to the mature polypeptide, that directs the host cell to 
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translocate the polypeptide. The tenn "translocation signal sequence" is also used to refer 
to a signal sequence. Translocation signal sequences can be found associated with a 
variety of proteins native to eukaiyotes and prokaryotes, and are often functional in both 
types of organisms. Proteins of the invention may be further modified and improved by 
adding a sequence which directs the secretion of the protein outside the host cell. The 
addition of the signal sequence does not interfere with the folding of the secreted protein, 
and evidence thereof is easily tested for using techniques known in the art and depending 
on the protein (e.g., tests for activity of a given protein after modification). 

The terms "express" and "expression" mean allowing or causing the information in 
a gene or DNA sequence to become manifest, for example producing a protein by 
activating the cellular functions involved in transcription and translation of a 
corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to 
form an "expression product" such as a protein. The expression product itself , e.g. the 
resulting protein, may also be said to be "expressed" by the cell. A polynucleotide or 
polypeptide is expressed recombinantly, for example, when it is expressed or produced in 
a foreign host cell under the control of a foreign or native promoter, or in a native host cell 
under the control of a foreign promoter. 

A polynucleotide or polypeptide is "over-expressed" when it is expressed or 
produced in an amount or yield that is substantially higher than a given base-line yield, e.g. 
a yield that occurs in nature. For example, a polypeptide is over-expressed when the yield 
is substantially greater than the normal, average or base-line yield of the native 
polypolypeptide in native host cells under given conditions, for example conditions 
suitable to the life cycle of the native host cells. Over-expression of a polypeptide can be 
obtained, for example, by altering any one or more of: (a) the growth or living conditions 
of the host cells; (b) the polynucleotide encoding the polypeptide to be over-expressed; (c) 
the promoter used to control expression of the polynucleotide; and (d) the host cells 
themselves. This is a relative, and thus "over-expression" can also be used to compare or 
distinguish the expression level of one polypeptide to another, without regard for whether 
either polypeptide is a native polypeptide or is encoded by a native polynucleotide. 
Typically, over-expression means a yield that is at least about two times a normal, average 
or given base-line yield. Thus, a polypeptide is over-expressed when it is produced in an 
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amount or yield that is substantially higher than the amount or yield of a parent 
polypeptide or under parent conditions. Likewise, a polypeptide is "under-expressed" 
when it is produced in an amount or yield that is substantially lower than the amount or 
yield of a parent polypeptide or under parent conditions, e.g. at least half the base-line 
yield. In this context, the expression level or yield refers to the amount or concentration of 
polynucleotide that is expressed, or polypeptide that is produced (i.e. expression product), 
whether or not in an active or functional form. As one example, a polynucleotide or 
polypeptide may be said to be under-expressed when it is expressed in detectable amounts 
under the control of an inducible promoter, but without induction, i.e. in the absence of an 
inducer compound. 

An expression product can be characterized as intracellular, extracellular or 
secreted. The term "intracellular" means something that is inside a cell. The term 
"extracellular" means something that is outside a cell. A substance is "secreted" by a cell 
if it delivered to the periplasm or outside the cell, from somewhere on or inside the cell. 

As used herein, the terms "expression-resistant polypeptide" and "resistant to 
functional expression" are synonymous and refer to a polypeptide that is difficult to 
functionally express in selected host cells. For example, an expression-resistant 
polypeptide is not produced, or is produced in very low yield or in non-functional form, 
when a polynucleotide encoding that polypeptide is transformed or introduced into host 
cells, e.g. into a facile host cell expression system. 

The term "transformation" means the introduction of a "foreign" (i.e. extrinsic or 
extracellular) gene, DNA or RNA sequence to a host cell, so that the host cell will express 
the introduced gene or sequence to produce a desired substance, typically a protein or 
enzyme coded by the introduced gene or sequence. The introduced gene or sequence may 
also be called a "cloned" or "foreign" gene or sequence, may include regulatory or control 
sequences, such as start, stop, promoter, signal, secretion, or other sequences used by a 
cell's genetic machinery. The gene or sequence may include nonfunctional sequences or 
sequences with no known function. A host cell that receives and expresses introduced 
DNA or RNA has been '^transformed" and is a "transformant" or a "clone." The DNA or 
RNA introduced to a host cell can come from any source, including cells of the same 
genus or species as the host cell, or cells of a different genus or species. 
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The terms 'Vector", "cloning vector" and "expression vector" mean the vehicle by 
which a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so 
as to transform the host and promote expression (e.g. transcription and translation) of the 
introduced sequence. 

5 Vectors typically comprise the DNA of a transmissible agent, into which foreign 

DNA is inserted. A common way to insert one segment of DNA into another segment of 
DNA involves the use of enzymes called restriction enzymes that cleave DNA at specific 
sites (specific groups of nucleotides) called restriction sites. Generally, foreign DNA is 
inserted at one or more restriction sites of the vector DNA, and then is carried by the 
10 vector into a host cell along with the transmissible vector DNA. A segment or sequence of 

DNA having inserted or added DNA, such as an expression vector, can also be called a 
"DNA construct." 

A common type of vector is a "plasmid", which generally is a self-contained 
molecule of double-stranded DNA, that can readily accept additional (foreign) DNA and 

1 5 which can readily introduced into a suitable host cell. A plasmid vector often contains 

coding DNA and promoter DNA and has one or more restriction sites suitable for inserting 
foreign DNA. Promoter DNA and coding DNA may be from the same gene or from 
different genes, and may be from the same or different organisms. A large number of 
vectors, including plasmid and fungal vectors, have been described for replication and/or 

20 expression in a variety of eukaryotic and prokaryotic hosts. Non-limiting examples 

include pKK plasmids (Clonetech), pUC plasmids, pET plasmids (Novagen, Inc., 
Madison, WI), pRSET or pREP plasmids (Invitrogen, San Diego, CA), or pMAL 
plasmids (New England Biolabs, Beverly, MA), and many appropriate host cells, using 
methods disclosed or cited herein or otherwise known to those skilled in the relevant art. 

25 Recombinant cloning vectors will often include one or more replication systems for 

cloning or expression, one or more markers for selection in the host, e.g. antibiotic 
resistance, and one or more expression cassettes. Routine experimentation in 
biotechnology can be used to determine which vectors are best suited for used with the 
invention. In general, the choice of vector depends on the size of the polynucleotide 

30 sequence and the host cell to be employed in the methods of this invention. 
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A "cassette" refers to a segment of DNA that can be inserted into a vector at 
specific restriction sites. The segment of DNA encodes a polypeptide of interest, and the 
cassette and restriction sites are designed to ensure insertion of the cassette in the proper 
reading frame for transcription and translation. 

The term "expression system" means a host cell and compatible vector under 
suitable conditions, e.g. for the expression of a protein coded for by foreign DNA carried 
by the vector and introduced to the host cell. Common expression systems include 
bacteria {e.g. E. coli and B. subtilis) or yeast (e.g. S. cerevisiae) host cells and plasmid 
vectors, and insect host cells and Baculovirus vectors. As used herein, a "facile expression 
system" means any expression system that is foreign or heterologous to a selected 
polynucleotide or polypeptide, and which employs host cells that can be grown or 
maintained more advantageously than cells that are native or heterologous to the selected 
polynucleotide or polypeptide, or which can produce the polypeptide more efficiently or in 
higher yield. For example, the use of robust prokaryotic cells to express a protein of 
eukaryotic origin would be a facile expression system. Preferred facile expression systems 
include E. coli, B. subtilis and S. cerevisiae host cells and any suitable vector. 

The terms "mutant" and "mutation" mean any detectable change in genetic 
material, e.g. DNA, or any process, mechanism, or result of such a change. This includes 
gene mutations, in which the structure (e.g. DNA sequence) of a gene is altered, any gene 
or DNA arising from any mutation process, and any expression product (e.g. protein or 
enzyme) expressed by a modified gene or DNA sequence. The term "variant" may also be 
used to indicate a modified or altered gene, DNA sequence, enzyme, cell, etc., i.e., any 
kind of mutant. Such changes also include changes in the promoter, ribosome binding 
site, etc. 

"Sequence-conservative variants" of a polynucleotide sequence are those in which 
a change of one or more nucleotides in a given codon position results in no alteration in 
the amino acid encoded at that position. 

"Function-conservative variants" are those in which a given amino acid residue in a 
protein or enzyme has been changed without altering the overall conformation and 
function of the polypeptide, including, but not limited to, replacement of an amino acid 
with one having similar properties (such as, for example, acidic, basic, hydrophobic, and 
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the like). Amino acids with similar properties are well known in the art. For example, 
arginine, histidine and lysine are hydrophilic-basic amino acids and may be 
interchangeable. Similarly, isoleucine, a hydrophobic amino acid, may be replaced with 
leucine, methionine or valine. Amino acids other than those indicated as conserved may 
differ in a protein or enzyme so that the percent protein or amino acid sequence similarity 
between any two proteins of similar function may vary and may be, for example, from 
70% to 99% as determined according to an alignment scheme such as by the Cluster 
Method, wherein similarity is based on the MEGALIGN algorithm. A "function- 
conservative variant" also includes a polypeptide or enzyme which has at least 60 % amino 
acid identity as determined by BLAST or FASTA algorithms, preferably at least 75%, 
most preferably at least 85%, and even more preferably at least 90%, and which has the 
same or substantially similar properties or functions as the native or parent protein or 
enzyme to which it is compared. 

The term "DNA reassembly" is used when recombination occurs between identical 
sequences. "DNA shuffling" refers to a group of in vitro or in vivo methods involving 
recombination of nucleic acid species. For example, homologous recombination of pools 
of nucleic acid fragments or polynucleotides can be employed to generate polynucleotide 
molecules having variant sequences of the invention. Such methods can be employed to 
generate polynucleotide molecules having variant sequences of the invention. 

"Isolation" or "purification" of a polypeptide or enzyme refers to the derivation of 
the polypeptide by removing it from its original environment (for example, from its natural 
environment if it is naturally occurring, or form the host cell if it is produced by 
recombinant DNA methods). Methods for polypeptide purification are well-known in the 
art, including, without limitation, preparative disc-gel electrophoresis, isoelectric focusing, 
HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, 
and countercurrent distribution. For some purposes, it is preferable to produce the 
polypeptide in a recombinant system in which the protein contains an additional sequence 
tag that facilitates purification, such as, but not limited to, a polyhistidine sequence. The 
polypeptide can then be purified from a crude lysate of the host cell by chromatography on 
an appropriate solid-phase matrix. Alternatively, antibodies produced against the protein 
or against peptides derived therefrom can be used as purification reagents. Other 
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purification methods are possible. A purified polynucleotide or polypeptide may contain 
less than about 50%, preferably less than about 75%, and most preferably less than about 
90%, of the cellular components with which it was originally associated. A "substantially 
pure" enzyme indicates the highest degree of purity which can be achieved using 
conventional purification techniques known in the art. 

Polynucleotides are "hybridizable" to each other when at least one strand of one 
polynucleotide can anneal to another polynucleotide under defined stringency conditions. 
Stringency of hybridization is determined, e.g., by a) the temperature at which 
hybridization and/or washing is performed, and b) the ionic strength and polarity (e.g., 
formamide) of the hybridization and washing solutions, as well as other parameters. 
Hybridization requires that the two polynucleotides contain substantially complementary 
sequences; depending on the stringency of hybridization, however, mismatches may be 
tolerated. Typically, hybridization of two sequences at high stringency (such as, for 
example, in an aqueous solution of 0.5X SSC at 65°C) requires that the sequences exhibit 
some high degree of complementarity over their entire sequence. Conditions of 
intermediate stringency (such as, for example, an aqueous solution of 2X SSC at 65°C) 
and low stringency (such as, for example, an aqueous solution of 2X SSC at 55°C), require 
correspondingly less overall complementarity between the hybridizing sequences. (IX 
SSC is 0.15 MNaCl, 0.015 MNa citrate.) Polynucleotides that "hybridize" to the 
polynucleotides herein may be of any length. In one embodiment, such polynucleotides 
are at least 10, preferably at least 15 and most preferably at least 20 nucleotides long. In 
another embodiment, polynucleotides that hybridizes are of about the same length. In 
another embodiment, polynucleotides that hybridize include those which anneal under 
suitable stringency conditions and which encode polypeptides or enzymes having the 
same function, such as the ability to catalyze an oxidation, oxygenase, or coupling reaction 
of the invention. 

The general genetic engineering tools and techniques discussed here, including 
transformation and expression, the use of host cells, vectors, expression systems, etc., are 
well known in the art 
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Mutagenesis and Directed Evolution of Proteins 

To improve the expression and function of proteins using conventional expression 
systems, the invention makes the unexpected discovery that directed evolution can be used 
to generate mutant libraries of polynucleotides which, when expressed using conventional 
or facile expression systems, result in functional proteins having increased activity and/or 
thermostability. 

According to the invention, proteins that are expressed in facile gene expression 
systems can be obtained by using directed evolution to generate mutant polynucleotides in 
a library format for selection. General methods for generating libraries and isolating and 
identifying improved proteins (also described as "variants") according to the invention 
using directed evolution are described briefly below and more extensively, for example, in 
U.S. Patent Nos. 5,741,691 and 5,811,238. See also, International' Applications WO 
98/42832, WO 95/22625, WO 97/20078, and WO 95/ and U.S. Patents 5,605,793 and 
5,830,721 (143, 149-156). It should be understood that any method for generating 
mutations in polynucleotide sequences to provide an evolved polynucleotide for use in 
expression systems can be employed. Proteins produced by directed evolution methods 
can then be screened for improved expression, activity, thermostability, folding, secretion, 
and other functions and properties according to conventional methods. 

Any source of nucleic acid in purified form can be utilized as the starting nucleic 
acid. Thus the process may employ DNA or KNA including messenger RNA, which DNA 
or RNA may be single or double stranded. In addition, a DNA-RNA hybrid which 
contains one strand of each may be utilized. The nucleic acid sequence may be of various 
lengths depending on the size of the nucleic acid sequence to be mutated. Preferably the 
specific nucleic acid sequence is from 50 to 50,000 base pairs. It is contemplated that 
entire vectors containing the nucleic acid encoding the protein of interest may be used in 
the methods of this invention. 

Any specific nucleic acid sequence can be used to produce the population of 
mutants by the present process. An initial population of the specific nucleic acid 
sequences having mutations may be created by a number of different known methods, 
some of which are set forth below. 
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Error-prone polymerase chain reaction (20,45,46) and cassette mutagenesis (3 8- 
44), in which the specific region optimized is replaced with a synthetically mutagenized 
oligonucleotide can be employed in the invention. Error-prone PCR can be used to 
mutagenize a mixture of fragments of unknown sequences. These techniques can also be 
employed under low-fidelity polymerization conditions to introduce a low level of point 
mutations randomly over a long sequence, or to mutagenize a mixture of fragments of 
unknown sequence. 

Oligonucleotide-directed mutagenesis, which replaces a short sequence with a 
synthetically mutagenized oligonucleotide may also be employed to generate evolved 
polynucleotides having improved expression. 

Alternatively, nucleic acid or DNA shuffling, which uses a method of in vitro or in 
vivo, generally homologous, recombination of pools of nucleic acid fragments or 
polynucleotides, can be employed to generate polynucleotide molecules having variant 
sequences of the invention. 

Parallel PCR is another method that can be used to evolve polynucleotides for 
improved expression, function or properties in conventional expression systems, which 
uses a large number of different PCR reactions that occur in parallel in the same vessel, 
such that the product of one reaction primes the product of another reaction. Sequences 
can be randomly mutagenized at various levels by random fragmentation and reassembly 
of the fragments by mutual priming. Site-specific mutations can be introduced into long 
sequences by random fragmentation of the template followed by reassembly of the 
fragments in the presence of mutagenic oligonucleotides. 

A particularly useful application of parallel PCR, which can be used in the 
invention, is called sexual PCR. In sexual PCR, also known as DNA shuffling, parallel 
PCR is used to perform in vitro recombination on a pool of DNA sequences. Sexual PCR 
can also be used to construct libraries of chimaeras of genes from different species. 

The polynucleotide sequences for use in the invention can also be altered by 
chemical mutagenesis. Chemical mutagens include, for example, sodium bisulfite, nitrous 
acid, hydroxylamine, hydrazine or formic acid. Other agents which are analogues of 
nucleotide precursors include nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. 
Generally, these agents are added to the PCR reaction in place of the nucleotide precursor 
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thereby mutating the sequence. Intercalating agents such as proflavine, acriflavine, 
quinacrine and the like can also be used. Random mutagenesis of the polynucleotide 
sequence can also be achieved by irradiation with X-rays or ultraviolet light, or by 
subjecting the polynucleotide to propagation in a host (such as E. colt) that is deficient in 
thenormal DNA damage repair function. Generally, plasmid DNA or DNA fragments so 
mutagenized are introduced into E. coli and propagated as a pool or library of mutant 
plasmids. 

Alternatively a mixed population of specific nucleic acids may be found in nature 
in that they may consist of different alleles of the same gene or the same gene from 
different related species (z.e., cognate genes). Alternatively, they may be related DNA 
sequences found within one species, for example, the oxidase class of genes. Once the 
mixed population of the specific nucleic acid sequences is generated, the polynucleotides 
can be used directly or inserted into an appropriate cloning vector, using techniques 
well-known in the art. 

Once the evolved polynucleotide molecules are generated they can be cloned into a 
suitable vector selected by the skilled artisan according to methods well known in the art. 
If a mixed population of the specific nucleic acid sequence is cloned into a vector it can be 
clonally amplified by inserting each vector into a host cell and allowing the host cell to 
amplify the vector. The mixed population may be tested to identify the desired 
recombinant nucleic acid fragment. The method of selection will depend on the DNA 
fragment desired. For example, in this invention a DNA fragment which encodes for a 
protein with improved properties can be determined by tests for functional activity and/or 
stability of the protein. Such tests are well known in the art. 

Using the methods of directed evolution, the invention provides a novel means for 
producing functional, and soluble proteins with improved activity toward one or more 
substrates. The mutants can be expressed in conventional or facile expression systems 
such as E. colt Conventional tests can be used to determine whether a protein of interest 
produced from an expression system has improved expression, folding and/or functional 
properties. For example, to determine whether a polynucleotide subjected to directed 
evolution and expressed in a foreign host cell produces a protein with improved activity, 
one skilled in the art can perform experiments designed to test the functional activity of the 
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protein. Briefly, the evolved protein can be rapidly screened, and is readily isolated and 
purified from the expression system or media if secreted. It can then be subjected to 
assays designed to test functional activity of the particular protein in native form. Such 
experiments for various proteins are well known in the art, and are discussed in the 
Examples below. 

In one embodiment, the invention contemplates the use polynucleotides encoding 
for variants of oxidase enzymes. The invention employs directed evolution to generate 
novel oxidase enzymes, such as GAO, which are expressed in host cells (e.g, E. colt) used 
in an expression system, and which exhibit increased functional activity and increased 
thermostability. 

The invention can also be applied to select or optimize an expression system, 
including selection of host cells, promoters, and signal sequences. Expression conditions 
can also be optimized according to the invention. 

Directed Evolution of Galactose Oxidase 

Galactose oxidase (EC 1 .1 .3.9) is an alcohol oxidase enzyme. It oxidizes the 
hydroxyl group of the sixth carbon of D-galactose. It also oxidizes many other kinds of 
sugars and alcohols (77, 108, 1 14, 115, 1 18-120). Although many fungi produce galactose 
oxidase, no bacterium has been reported to produce the enzyme (109). There are many 
reports about galactose oxidase from Fusarium ssp. NRRL2903, which is identical to 
Dactylium dendroides ATCC46032 (76-78, 84-86, 88, 95, 99, 108, 110-128). FIG. 1 The 
native enzyme is an extra-cellular monomer enzyme and has molecular weight as 67,000. 
It has one copper (II) ion associated with it active site and related to its oxidation 
properties. FIG. 2. Structure and amino acid residues related to catalysis have been 
characterized and reported (76, 78, 84-86, 88, 1 1 1-113, 116-1 19). 

Galactose oxidase is currently used mainly for assays of D-galactose and D- 
galactosamine. The enzyme oxidizes the hydroxyl group in the substrate to an aldehyde, 
which is reactive. Therefore, the enzyme is implicated for use in production of non-natural 
sugars and derivatives of sugars (1 1 8, 1 19, 95, 99, 128). Hyper-production of galactose 
oxidase would be useful for a wide variety of applications. The gene of the galactose 
oxidase has been cloned (1 1 0) and expressed in Escherichia coli (127). This recombinant 



WO 01/88110 



PCT/US00/32345 



-26- 

galactose oxidase was produced as a fused protein with the N-terminal sequence of LacZ. 
However, the yield of the galactose oxidase by this recombinant E. coli was not 
satisfactory. 

According to the invention, galactose oxidase enzyme (GAO) has been produced in 
high activity and with improved properties by recombinant techniques in E, coli. 

The following Examples are understood to be exemplary only, and do not limit the 
scope of the invention or the appended claims. A person of ordinary skill in the art will 
appreciate that the invention can be practiced in many forms according to the claims and 
disclosures here. 

EXAMPLE 1 

Activity Assays for Galactose Oxidase Expressed in E. coli 

This Example describes assays used for evaluating galactose oxidase activity. 
Galactose oxidase generates equimolar amounts of hydrogen peroxide by oxidation of a 
substrate. Colorimetric detection of hydrogen peroxide was therefore used to assay 
galactose oxidase activity, employing the following reaction scheme: 

GAO peroxidase 

R-CH 2 OH + 0 2 > R-CHO + HA > H 2 0 

chromogen 

color change 

This system can be used to assay for oxidation of various substrates, with a very 
high sensitivity. In the reaction scheme above, an alcohol group of a substrate R is 
oxidized to produce an aldehyde and hydrogen peroxide (H^O^ is released. For example, 
D-galactose is converted to D-galactohexodialdose plus H 2 0 2 . The chromogen, in the 
presence of hydrogen peroxide and peroxidase enzyme, e.g. horseradish peroxidase (HRP), 
produces a detectable color change, indicating that the reaction catalyzed by GAO has 
occurred. 
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A. Test Tube Assay 

The activity of galactose oxidase produced in E. coli was investigated using fungal 
galactose oxidase (Sigma, partially purified) as a standard. For detecting hydrogen 
peroxide with peroxidase (Sigma, type I from horseradish), a chromogen was selected for 
5 the GAO assays (85). 

L Materials 

Cells. E. coli DH5aMCR (Life Technologies) was used for gene manipulation. E. 
coli BL21(DE3) (Novagen) was used as a host strain for expression of galactose oxidase 
gene. E. coli KY-14478 (SN0029, lacking catalase, Kyowa Hakko Kogyo, Co. Ltd.) was 
1 0 also used for manipulation and expression of genes (1 57). Competent cells for 

electroporation were prepared (147). 

Cultivation Media. Luria-Bertani LB medium (10 g/1 bacto tryptone, 5 g/1 bacto 
yeast extract, 10 g/1 NaCl, pH 7.5) was used mainly for cultivation of E. coli (19). LB 
plates contained 15 g/1 agar in LB medium. Ampicillin (100 mg/1) was added to the 
1 5 medium when required. 

Buffers. Solutions of sodium phosphate, potassium phosphate and Tris-HCl at 
various pHs were tested as buffer solution for the assay. 

Chromogens. Many aromatic compounds can be used as a chromogen for the 
assay. Four chromogens showed particularly strong color formation; green, orange, red 
20 and red, respectively: (a) 2,2'-azinobis(3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) 

(85); (b) o-anisidine; (c) o-dianisidine (127, 123, 121, 122) and (d) o-tolidine (114, 119). 
Their peaks of absorbance were 410 run, 490 nm, 460 run and 420 nm. 
2 Methods 

Cultivation. Three steps of cultivation were performed for production of galactose 
25 oxidase. Recombinant E coli strains were cultivated on LB plate containing ampicillin at 

30 °C for 1 8 hours. The cells were inoculated to LB containing ampicillin. After 
cultivation at 30 °C for 12 hours, the culture was transfered to a new test tube containing 3 
ml LB supplemented with ampicillin. The inoculation rate was 0.5 % of medium. 
Isopropyl beta-D-thiogalactopyranoside (IP TG) (1 mM) was added for induction after 
30 cultivation at 30 °C for 7 hours. Cultivation was continued at 30°C for 6 hours. 

Permeabilization. Permeable cells were prepared by freezing (-20°C) - thawing 
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(4°C) and treatment with 0.5 mg/l lysozyme (Sigma, from chicken egg white) for 30 
minutes at 37°C. This pre-treatment for permeablization was used for assay in evaluation 
of recombinant galactose oxidase. (Example 3). 

Activity assay. The extract was assayed for galactose oxidase activity. Copper (II) 
sulfate solution (0.4 mM) was added to the cell-free extract. The cell-free extract was 
diluted in the buffer solution. Peroxidase (Sigma, type I from horseradish) (10 units/ml) 
and azinobis(3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) (2 g/1) were added to the 
reaction solution. The reaction solution was pre-incubated at 37 °C for 5 minutes. 
Substrate was added to the solution to be 100 mM. The increase of absorbance (410 nm or 
405 nm) was measured at 37 °C for 1 minute. Fungal galactose oxidase (Sigma, partially 
purified) was used as standard for estimation of the activity. 
5. Results 

From these experiments, ABTS was selected as a preferred chromogen for these 
types of assays, since ABTS formed its color most strongly and sensitively. Moreover, the 
highest assay sensitivity and lowest background was achieved when using a 100 mM 
sodium phosphate buffer solution (pH 7.0) for the assay. 

Minimum detectable activity of galactose oxidase for this assay system was 0.05 
units/ml. Galactose oxidase activity between 0.1 and 1 units/ml was measured 
quantitatively by photometer at 410 nm or 405 nm. 

Catalase produced by £ coli degrades hydrogen peroxide and may influence the 
assay. In practice, catalase was not observed to pose a problem, because the activity of the 
galactose oxidase was greatly higher than that of catalase. 

Provided below are additional galactose oxidase screening techniques and/or 
activity assays, having the following advantages: high specificity for galactose oxidase, 
high sensitivity, good reproducibility, quantitative measurements, simplicity, flexibility for 
many substrates, and low cost. One screening system utilizes microplates and the other 
utilizes membranes. Both systems applies horseradish peroxidase (type I, Sigma) together 
with a chromogen (ABTS). 
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B. Microplate Screening Method 

The following micro-plate assay has a high sensitivity. Moreover, the enzyme 
activity can be determined quantitatively. To increase throughput, the method can be 
automated, for example robotically. This method is particularly suitable as a second 
screen, after active clones are identified by a more rapid first screen, such as a membrane 
screen. In experiments using these procedures, the active cultures on the microplate had 
galactose oxidase activity as indicated by strong green color formation, where each 
positive well on the microplate was visible as a dark circle. GAO activity was screened in 
96-well plates. 

Briefly, single colonies were picked from LB-Ampicillin (LB-Ap) agar plates into 
deep- well plates and grown in LB-Ap. The master plates were duplicated into new deep- 
well plates containing LB-Ap- 1 mM DPTG. Following cultivation at 30°C, CuS0 4 was 
added and the cells were lysed with lysozyme and SDS. Cell extracts were reacted with 
galactose and allyl alcohol using the GAO-HRP coupled assay described above. 
L Methods for Approach A 

Single colonies were picked from Luria-Bertani/100 //g/ml ampicillin (LB-Ap) 
agar plates into deep-well polypropylene plates (well depth: 2.4 cm; volume: 1 ml; from 
Beckton Dickinson Labware) and cells were grown for 10 h at 30 °C and 270 rpm in 200 
(A LB-Ap. The master plates were duplicated by transferring a 10 fA aliquot to a new deep- 
well plate containing. 300 }A LB-Ap and 1 mM isopropyl-beta-D-thiogalactopyranoside 
(EPTG) and grown for 12 h at 30 °C and 250 rpm. The cultures were then centrifuged for 
10 min at 5000 rpm and the cell pellet was resuspended in 300 /A 100 mM sodium 
phosphate (NaPi) buffer, pH 7.0 containing 0.4 mM CuS0 4 . Following addition of 0.5 
mg/ml lysozyme (35 min at 37 °C) and 2.5% (w/v) SDS (overnight at 4 °C), the GAO 
activity was assayed using the GAO-horseradish peroxidase (HRP) coupled assay (85). 
Aliquots of the cell extracts were reacted with galactose (50 mM for generation Al or 25 
mM for generations A2 and A3) and allyl alcohol (0.5 M for all generations) at pH 7.0. 
The initial rate of H 2 0 2 formation was followed by monitoring the HRP-catalyzed 
oxidation of 2,2'-azino-bis(3-ethylbenzthiazoline-6-sulfonic acid) (ABTS) at 405 nm. To 
assay thermostability, the plates were heated at a given temperature for 10 min, cooled 
down on ice for 10 min, and allowed to reach room temperature for ca. 5 min before the 
activity toward galactose was measured. The thermostability index was determined from 
the ratio of the residual GAO activity to the initial activity. Mutants identified as 
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thermostable were then grown in test tubes (3 ml cultures) and the residual activity after 
heating at various temperatures was measured at room temperature. 
2. Methods for Approach B 
Single colonies were picked from LB-Ap agar plates into deep-well polypropylene 
plates (well depth: 4.4 cm; volume: 2.2 ml; from Qiagen) and cells were grown for 8 h at 
30 °C and 270 rpm in 500 \A LB-Ap. The master plates were duplicated by transferring a 
10 \A aliquot to a new deep-well plate containing 500 /A LB-Ap-1 mM EPTG and grown 
overnight at 30 °C and 270 rpm. An aliquot of the culture was transfeired to a microliter 
plate. Following addition of 0.5 mg/ml (30 min at 37 °C) and 0.4% (w/v) SDS - 0.4 mM 
CuS0 4 in 100 mM NaPi buffer, pH 7.0 (4 h at 4 °C), the GAO activity was assayed using 
the GAO-HRP coupled assay as described above. The galactose concentration used was 
25 mM (generations Bl and B2) or 10 mM (generations B3 and B4). 

C. Membrane Screening Method 

Although the micro-plate screening system is highly sensitivity and quantitative, it 
is desirable to provide a method that contemporaneously assay many more, e.g. thousands 
more clones in a sensitive, accurate, practical and efficient manner. Methods for detection 
of galactose oxidase activities directly from colonies on agar-plate were examined, but 
were found to exhibit relatively low sensitivity, low reproducibility, and very slow color 
formation. Hence, to evaluate very large number of mutants, methods for detection of their 
activities directly from colonies on agar-plate or from colonies transferred onto a 
membrane were examined. These methods were based on colorimetric detection using 
chromogen and peroxidase, as in the micro-plate screening system. 

A suitable screening method using membranes was developed, as is shown here in 
one optimized form. After transformants formed colonies on an LB-Ap plate (100 mg/1 at 
30 °C for 1 8-24 hours), these colonies were transferred to a membrane, z.c. they were 
adsorbed onto the membrane and lifted, for cultivation, the membrane was placed on a 
new LB-Ap plate (100 mg/1) and was incubated at 30 °C till new colonies were formed on 
the membrane (6-12 hours). The membrane then was transferred to a new LB-Ap (1 00 
mg/I) plate with 1 mM IPTG, at 30 °C for 6 hours, for induction. Then, the membrane 
was put on a filter paper at room temperature, containing lysozyme (0.5 mg/ml), D- 
galactose (100 mM), ABTS (2 mg/ml), peroxidase (10 units/ml) and CuS0 4 (0.4 mM). In 
experiments using these procedures, colonies which had galactose oxidase activities 
showed as deep purple on the filter paper. This simple method has suitable sensitivity and 
can be used to evaluate several thousands colonies on one membrane at once. 
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Several thousands colonies can be evaluate by the screening method with one 
membrane. This method can be used with an image analyzer, for quantitative 
determination of activity of each colony. Although the sensitivity of this method is not as 
high as others, the method is fast and is suitable for a first or initial screening, because 
many thousands or even millions of colonies can be contemporaneously or rapidly 
evaluated. 

In a preferred embodiment, galactose oxidase activities of colonies which were 
transferred on a membrane were estimated directly. Colonies, which were formed on LB- 
Apicillin plate at 30 °C for 24 hours, were transferred onto a membrane (Immobilon NC 
(HATF), surfactant-free, 45 mm, 82 mm, Millipore). The membrane was put on a new 
LB-Apicillin plate and was kept at 30 °C for 6-12 hours till colonies were re-formed. 
Then the membrane was transferred onto an LB-Apicillin plate containing 1 mM EPTG 
and was incubated for 6 hours at 30 °C. After the membrane was put on filter paper 
containing 0.5 mg/1 lysozyme, 100 mM substrate, 2 mg/ml ABTS, 10 units/ml peroxidase - 
and 0.4 mM CuS0 4 in 100 mM sodium phosphate buffer solution (pH 7.0), the membrane 
was kept at room temperature for one day, covered with a shield (ABTS is light sensitive). 
Active colonies showed deep purple color formations. 

D. Assay Reagents and Conditions 

Some of the assays herein use CuS0 4 , and/or SDS. 

Copper sulfate is used to provide copper (II) ion to activate the recombinant 
(mutant or variant) enzyme. The activity of partially purified galactose oxidase from D. 
dendroides (Sigma) was detected well by using peroxidase and ABTS as described; the 
addition of copper (IT) ion and other cofactors was not needed. (The Sigma enzyme 
already includes copper ions.) However, experiments with cell-free extracts of 
recombinant GAO enzymes of the invention showed that almost no activity was detected 
in the absence of copper (H) ions. Thus, the presence of copper (II) ion is preferred, and 
without being bound by any theory, is believed to be essential, to activate recombinant 
GAO enzymes produced by E. coli as described herein. Treatment with copper ions at 4 
°Cis preferred. Copper ion can be provided as copper sulfate (CuS0 4 ). Experiments 
showed that 0.1 mM CuS0 4 is sufficient, whereas 10 mM CuS0 4 slightly inhibited GAO 
activity. Experiments under assay conditions showed that the preferred concentration of 
CuS0 4 for activating crude enzyme solution is 0.4 mM. The metal (II) ions of iron, cobalt, 
nickel, and manganese, and the metal chelator EDTA, did not affect activation of the 
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recombinant GAO in experiments under assay conditions. Experimental results are shown 
in FIG, 3. under assay conditions, with and without various metal (II) ions or EDTA. 

Detection enhancers. In certain assay embodiments, sodium azide or sodium 
sulfide may be added, for example in an amount of from about 0.01 mM to less than 1 
mM. These reagents may enhance detection of GAO activity in some circumstances. 

Detergents. Addition of detergents to the assay solution also increased the 
observed activity. Pretreatment with SDS was most effective for increasing the galactose 
oxidase activity. Treatment with SDS for longer than 12 hours at 4 °C after treatment with 
lysozyme was suitable for the assay. The galactose oxidase activity did not change within 
the treatment for 12 to 24 hours at 4 °C. Cultivation, pre-treatment and assay were done 
as described above. 

Other detergents may also be used, as shown in TABLE 1. In these experiments, 
approximately 0.1 units/ml culture of E. coli BL21(DE3)/pGAO-010 and 0.25 units of 
partially purified galactose oxidase (Sigma) were used. Cells were treated with 0.5 mg/ml 
lysozyme at 37 °C for 30 minutes. Enzyme and cells were treated with detergents at 4 °C 
for 1-12 hours. Galactose oxidase activities were assayed using the microplate method 
described above. 

Cultivation. Activation on LB-Ap (100 mg/1) plate for 12-24 hours at 30 °C and 
seed-cultivation in LB-Ap (100 mg/1) 200-500 ^l/well for 8-10 hours at 30 °C provided 
uniform growth for cultivation. These conditions are suitable if not necessary for the 
assay, using the cells, reactants and reagents in these experiments. 

The addition of IPTG as an inducer was observed to be necessary for the 
expression of galactose oxidase on microplate cultivation in these experiments. Initial 
addition of EPTG to the medium was preferred to the addition of IPTG during cultivation. 
A cultivation time of 12-16 hours was preferred, and provided superior results (overall 
higher activities) for almost all recombinant E. coli which had a plasmid for expression of 
galactose oxidase in these experiments. The growth of cells was stopped before 16 hours 
and the cell extracts had almost no activity at 37 °C. Cultivation at about 30 °C was the 
optimal temperature in these experiments. 
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EXAMPLE 2 
Construction of Galactose Oxidase Plasmids 

Plasmids were constructed to express galactose oxidase gene {gad) from Fusarium 
ssp. as described below. Several vectors were examined for high expression. Plasmids 
with different promoters and different sequences between the GAO gene and the ribosime 
binding site were constructed, as described. Escherichia coli strain BL21(DE3) and KY- 
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14478 were transformed with these plasmids. Permable cells from test tube cultures were 
used for the assay. 

A. Construction of Plasmids 

1. Modified pUC18 Vector Plasmids 

Modified pUC18 plasmids were made to be used for constructing galactose 
oxidase expression plasmids. As shown in FIG. 7, vector pUC18 was digested with the 
restriction enzyme Hindm, blunted with T4 DNA polymerase and ligated with T4 DNA 
ligase to create vector pUC18-HL lacking the Hindm site. pUC18-HL was digested with 
EcdRI 9 blunted with T4 DNA polymerase and ligated with T4 DNA ligase to create vector 
pUCl 8-EHL lacking the EcoKL and Hindm sites. Similarly, pUC18-EHL was digested 
with Pstl, blunted with T4 DNA polymerase and ligated with T4 DNA ligase to create 
vector pUC 1 8-EHPL, lacking the EcoKL, Hindm, and PstI sites. 

2. GAO Vector Plasmids 

As shown in FIG. 8, plasmid pGAO-010 expressing GAO was made using plasmid 
pR3. Plasmid pR3 contains the gene for mature galactose oxidase (GAO) fused to the 5* 
end of the lacZ fragment, and was obtained from Dr. Howard K. Kuramitsu (Dept. of Oral 
Biology, State University of New York, Buffalo, NY). The GAO gene was amplified from 
pR3 by PCR using primers P-MY001 and P-MY002 in order to introduce n Hindm 
restriction site followed by an ATG initiation codon immediately upstream from the 
mature GAO sequence, and an Xbal site immediately downstream from the stop codon. 
(Primer sequences are shown in FIG. 6). The PCR product was digested with Hindm and 
Xbal and ligated into a similarly digested pUCl 8 vector to create pGAO-001 . Plasmid 
pPLA-001 is a modified pUCl 8 vector containing a double lac promoter. The lac 
promoter from pUCl 8 was amplified using primers P-MY003 and P-MY004. The PCR 
product was digested with EcoRl and Hindm and ligated into a similarly digested pUCl 8 
vector. Following digestion of pGAO-001 with Hindm and Xbal pPLA-001 with EcoKI 
and HindLU, and pUC18-HL with EcoRI and Xbal, plasmid pGAO-010 was generated by 
ligation with T4-DNA ligase. 

Another plasmid, pGAO-036, was made by amplifying pGAO-010 using primers 
P-MY036andP-MY002. FIG. 9. The PCR product was digested withal and Xbal and 
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ligated with a similarly digested pUCl 8-EHL to create plasmid pGAO-027. Plasmid 
pGAO-027 was digested with Kpnl and XbcH and ligated with a similarly digested 
pUC18-EHPL to create plasmid pGAO-036. This plasmid contains a unique Pstl site. 
Plasmid pGAO-036 was used as a for directed evolution experiments described herein. 

Another plasmid, pGAO-011, was made using similar techniques, as shown in 
FIG. 10. 

B. Plasmids and transformation 

Plasmids for expression of galactose oxidase were constructed as described above. 
The galactose oxidase enzyme was amplified from pR3 {Fusarium ssp.) by PCR. The lac 
promoter of pUC18 and 77 promoter of pET-22b(+) (Novagen) were used for expression. 
In addition to expression as mature sequence of galactose oxidase, expression of the gene 
as a fused protein with other peptides was examined. The N terminal sequence of LacZ 
was selected to express the galactose oxidase as a fused protein (127). PelB leader 
sequence was also used to produce galactose oxidase in periplasm. Furthermore, His-tag 
which is useful for purification of recombinant proteins was examined as an additional 
sequence of the C-terminal of galactose oxidase. 77 terminator sequence was used for 
stabilization of expression. Two different oris were chosen for replication of plasmid. 
The copy number of plasmid with ori from pUC series is higher than the plasmid with ori 
from pBR series. 

In more detail, plasmids pUCl 8, pET-22b(+) (Novagen) and derivatives were used 
as vector plasmids. Galactose oxidase gene from Fusarium ssp. was amplified from pR3 
according to known techniques. (110, 127). Genes were manipulated according to 
conventional methods using kits from Qiagen (Valencia, CA). The QIAprep Spin 
Miniprep Kit, QIAquick Gel Extraction Kit and QIAEX E Gel Extraction Kit, were used 
resepctively for purification of plasmids from cells, purification of DNA fragments and 
extraction of DNA fragments from agarose gel. R coli DH5aMCR was transformed with 
plasmids by treatment with CaCl 2 (19). Electroporation was used for transformation of E. 
coli BL21(DE3) with plasmids (147, 148). 

pUCl 8 and pET-22b(+) (Navagen) were used as vector plasmids. The gene of 
galactose oxidase from pR3 (127) was used, lac promoter from pUCl 8, tac promoter 
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from pKK223-3 (Amercham Pharmacia Biotech) and 77 promoter from pET-22b(+) were 
selected for expression of the gene. The N terminal sequence of LacZ from pUC18, PelB 
leader, His-tag and 77 terminator sequences from pET-22b(+) were used for production of 
galactose oxidase. The gene and parts for expression were prepared by PCR. PCR was 
done in 100 ml of reaction solution containing PCR buffer (10 mM Tris-HCl, pH 8.5, 50 
mM KC1, 2.5 mM MgCl2, 0.01 % gelatin), 1 ng of DNA as template, 50 p mole of each 
primers, 2.5 units of Taq DNA polymerase (Perkin Elmer) and 50 n mole of each dNTPs. 
DNA fragments were amplified in 30 cycles of 30 seconds at 94 °C, 30 seconds at 50 °C 
and 60 seconds at 72 °C. PCR products were purified by QIAquick PCR Purification Kit 
(Qiagen). Cutting and ligation of DNA by enzymes were according by "molecular 
cloning" (19). E. coli cells were transformed with plasmids by electroporation (Bio-Rad, 
gene Pulser). QIAprep Spin Miniprep Kit (Qiagen) was used for purification of plasmid 
from E. coli recombinant cells. 

Using these strategies, plasmids were designed to produce the galactose oxidase 
gene. The plasmids were transformed to E. coli DHSaMCR, BL21(DE3) and KY-14478. 
Representative plasmids are shown diagrammatically in JIG. 11, according to the general 
scheme shown in FIG. 12. 

v Expression of the galactose oxidase gene in all constructed plasmids was 
controlled by the lac operator. Therefore, induction by isopropyl b-D- 
thiogalactopyranoside (IPTG) was necessary for production of the enzyme (FIG. 11). The 
expression of galactose oxidase was highest when IPTG (1 mM) was added after 
cultivation for 7 hours and cells were incubated for 6 more hours. Cultivation at 30 °C 
gave greatest activity of galactose oxidase per cultivation. Expression of the enzyme was 
remarkably decreased at 37 °C. Lower temperatures than 27 °C were not suitable in the 
experiments because the cells grew very slowly. 

Incubation on LB plate at 30 °C for 18 hours and pre-cultivation in LB at 30°C for 
12 hours stabilized the main cultivation. The optimal culture conditions were selected as 
shown above. 
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C. Galactose oxidase activity 

Galactose oxidase activities of the recombinant E. coli were measured (FIG, 11). 
Some recombinant strains showed much higher activities than the recombinant plasmid 
pR3. These recombinants hold plasmids which were constructed with lac promoter and 
ori from pUC series. Some recombinant E coli with plasmids, pGAO-018 and pGAO- 
023, expressing the galactose oxidase gene by 77 promoter did not grow well. Their 
galactose oxidase activities were not detected. Although some recombinants holding 
plasmid with 77 promoter, pGAO-008 and pGAO-009, grow normally, they showed low 
galactose oxidase activity. From these results, lac promoter was suitable for expression of 
galactose oxidase gene. Furthermore, double lac promoter seemed to be stronger than 
single lac promoter in some but not all cases. 

For example, plasmid pGAO-025 was designed to have double lac promoter and 
lacZ-gao fused gene (FIG, 13). However, galactose oxidase activity of a recombinant 
with pGAO-025 was almost the same as a recombinant with pGAO-01 1 which had a 
single lac promoter in KY-1447 cells but was more active than pGAO-01 1 in BL21(DE3) 
cells. Triple lac promoter was also examined to express the galactose oxidase gene. The 
effect of triple promoter was about the same as double promoter, e.g. in pGAO-028 and 
pGAO-010 (FIGS. 15 and 17). 

Galactose oxidase which was fused with the N-terminal sequence oiLacZ or PelB 
leader was produced, as well as non-fused proteins. The activity of galactose oxidase 
fused with PelB leader was not detected without a pre-treatment of cells. Detection of 
activity of the enzyme required same the pre-treatment of recombinant cells as others. In 
these experiments GAO was not secreted in the medium, although a secretion signal 
sequence was present 

Plasmids pGAO-003 and pGAO-005 were designed to produce galactose oxidase 
in fused form with His-tag at its C-terminal. No galactose oxidase activity was detected 
from recombinant strains with these plasmids. 

Terminator sequence sometimes stabilizes gene expression. In these experiments, 
introduction of 77 terminator sequence apparently did not increase GAO expression. 
Compare pGAO-020 with pGAO-010 or pGAO-022 with pGAO-017. 
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E coli DHSaMCR expressed the galactose oxidase gene with these plasmids. 
However their activities were lower then that of recombinant strains of £ coli BL21(DE3) 
and R coli KY-14478 (data not shown). E. coli BL21(DE3) and R coli KY-14478 with 
plasmid pGAO-010 or pGAO-027 successfully expressed galactose oxidase in high 
activity. These two plasmids have the same sequence except for one restriction 
endonuciease site in the vector sequence. Their structure is suitable to express the 
galactose oxidase in a mature fungal sequence. Consequently, E. coli BL21(DE3) and E. 
coli KY-14478 harvesting plasmid pGAO-010, pGAO-027 or their derivatives were used 
for continued experiments. 

D. Codon Alternation 

Codon alternation of the N-terminal sequence of a gene, without changing the 
peptide sequence, may cause higher expression of the gene in some cases. Codons of six 
N-terminal amino acid residues of galactose oxidase were exchanged randomly by PCR 
with a mixed primer, with the following alternations. 

SEP ID NO: 

(M) A S A P I G S A 26 
Wild-type sequence ATG GCC TCA GCA CCT ATC GGA AGC GCC ... 27 
Random Alternation — N — N — N — N — A — N ... 28 

T 

C 

The galactose oxidase gene of pGAO-010 was replaced with PCR products 
comprising the galactose oxidase gene with random codon alternation. The plasmids of 
this library were named pGAO-OlOM. This random codon alternation of the N-terminal 
sequence did not cause higher expression (FIG. 14), and in many cases GAO activity was 
reduced. No significant difference was observed when E. coli KY-14478 was used as a 
host strain, compared with E. coli BL21(DE3). 

E. Optimization of upper sequence of gao 

The region between the Shine-Dalgarno ("SD") sequence AGGA and the initiation 
codon, ATG, is sensitive for efficient RNA translation and has a significant influence on 
expression of gene. One to three bases were inserted between SD of the lac promoter and 
the ATG of the galactose oxidase gene in pGAO-027 to investigate the impact of altering 
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the distance between SD and ATG. A change in the length of the region between SD and 
ATG causes a decrease in galactose oxidase activity when E. coli BL21(DE3) was used as 
a host strain (TABLE 2; SEQ ID NOS: 29-36). The original sequence of pGAO-027 or 
the one-base extended sequence of pGAO-029 were preferred for expression of the gene. 
When E. coli KY-14478 was used as a host strain, one or two bases extension of the 
sequence between SD and ATG were preferred to express the gene. 

TABLE 2 



flasmid 


Sequence between SD and ATG 


Promoter 


GAO Activity (units/ml) 


BL21(DE3) 


KY-14478 


027 


. . .AGGAAAAGCTTATG. . . 


Viae 


19.0 


12.5 


029 


. . .AGGAAAAAGCTTATG. . . 


19.1 


15.7 


030 


. . .AGGAAACAAGCTTATG. . 


16.3 


15.9 


031 


. . .AGGAACAAAGCTTATG. . 


14.3 


13.1 


032 


. . .AGGAAAAGCTTATG. . . 


?tac 


30.6 


52.4 


033 


. . .AGGAAAAAGCTTATG. . . 


25.7 


56.2 


034 


. . .AGGAAACAAGCTTATG. . 


34.6 


49.8 


035 


. . .AGGAACAAAGCTTATG. . 


22.1 


38.7 


*Plasmids are designated pGAO-XXX, where XXX is 027 through 035 



The tac promoter often if not usually expresses genes at higher levels than lac 
promoter, tac promoter was prepared from pKK223-3 (Amercham Pharmacia Biotech) by 
PCR. lac promoters of plasmids, pGAO-027, pGAO-29, pGAO-030 and pGAO-031 were 
replaced with tac promoter. Recombinant strains with plasmids using tac promoter for 
expression showed approximately twice as much activity than the recombinant strains 
using lac promoter (TABLE 3). The optimal distance between SD and ATG under the tac 
promoter was almost the same as that under the lac promoter in both E. coli strains. 
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Recombinant strains E. coli BL21(DE3)/pGAO-034 and£. coli KY-14478/pGAO- 
033 were considered to be good for expression of galactose oxidase. Optimal culture 
conditions for these strains were as described above. 

F. Properties of recombinant galactose oxidase 

Galactose oxidase from Dactylium dendroides (Fusarium ssp.) and the enzyme 
from recombinant R coli BL21(DE3)/pGAO-010 differs only in glycosilation; their amino 
acid sequences are identical. 

Substrate specificities of recombinant galactose oxidase from R coli and the 
enzyme from fungi were compared. Cell-free extract of E. coli BL21 (DE3)/pGAO-01 0 
was used as a crude recombinant enzyme from E. coli. Partially purified galactose oxidase 
from Dactylium dendroides (Sigma, partially purified) was used as fungal enzyme. 
Substrate specificities of these two enzymes were almost same (FIG. 15). 

EXAMPLE 3 
Optimization of error-prone PGR conditions 

A. General PCR Conditions 

Mutation of the galactose oxidase gene (gao) was induced by error-prone PCR and 
according to known techniques (66, 129-133,136-139). Wild type gao on pGAO-027 
was replaced by the PCR products which were mutant galactose oxidase genes. The 
resultant plasmids were named as pGAO-027M. R coli BL21(DE3) was transformed with 
these plasmids. Almost all transfonnants carrying error prone PCR products instead of 
wild type gao lost their galactose oxidase activities (FIG. 7). Mutations were induced on 
the whole galactose oxidase gene by error-prone PCR, using conditions "A" of TABLE 3. 
228 clones were selected randomly from each set of conditions with different manganese 
concentrations. These clones were cultivated and assayed with micro-plates. More than 
65 % of transfonnants lost their galactose oxidase activity, even though manganese ions 
were not added to the PCR solution. 

Various reaction conditions for error-prone PCR were compared, and in particular 
milder conditions were examined for mutation of the galactose oxidase gene. Conditions 
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"A" and "C" are the previous conditions of error-prone PGR (above) and normal PCR 
conditions, respectively. The use of a buffer solution for error-prone PCR (Buffer EP) 
increased the error rate. Non-uniform composition of dNTPs for error-prone PCR (dNTPs 
EP) induced mutations in a higher rate than uniform composition of dNTPs for normal 
5 PCR (dNTPs normal). Tag DNA polymerase from Promega Corporation showed a higher 

error rate than the en2yme from Perkin Elmer. Since the rate of inactivation was 3 1 % at 
most in condition "C" (FIG. 5), induction of mutation was not optimal, and may have been 
insufficient. In FIG. 5, mutations were induced in the whole galactose oxidase gene by 
error-prone PCR using conditions "C" of TABLE 3. Activities of 288 clones from each 
1 0 set of conditions with different manganese concentration were estimated using micro-plate 

screening. 

From the alternatives examined in these experiments, Error-prone PCR condition 
M F" had a suitable frequency of error and was selected to induce mutation on the galactose 
oxidase gene in further experiments. The composition of buffer solution, the content of 

1 5 dNTPs and thermophilic DNA polymerase each affected the rate of mutation. For 

example, the difference between the buffer solution for normal PCR and the buffer 
solution for error-prone PCR was that the EP buffer contained gelatin. Since gelatin is not 
expected to influence the error rate of the PCR reaction, the observed rate difference may 
be due to a small difference in the final pH of reaction mixtures with these buffer 

20 solutions. More error was induced by non-uniform content of dNTPs for error-prone PCR 

than uniform content of dNTPs for normal PCR. Selection of the thermophilic DNA 
polymerase can be significant when optimizing an error-prone PCR experiment, as the 
particular polymerase may influence the mutation rate. 

PCR conditions selected for mutation of the whole galactose oxidase gene in these 

25 experiments was milder than previously disclosed conditions (66, 129-133, 136-139). 

When the PCR conditions described previously were used for error-prone PCR of 
galactose oxidase gene, the mutation rate was too high, resulting in too many inactive or 
low activity clones. This result may be related to the fact that the galactose oxidase gene is 
as much as twice as large as genes previously used for error-prone PCR in the literature. 

3 0 Without being bound by any theory, deadly mutations may be induced more frequently as 

the target gene becomes larger. 
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In TABLE 3, 96 of 288 clones were selected randomly from each library. Their 
galactose oxidase activities were estimated by micro-plate screening method. Rates of 
clones which lost their galactose oxidase activities are show in the table. 

FIG. 4 and FIG. 5 show the effect of varying amounts of MnCl 2 in these 
experiments. 

In the mutagenesis methods used herein, the error rate is from 1-6 mutations per 
polynucleotide, preferably 4-6, and most preferably 6. In certain embodiments with more 
than one round of directed evolution, the error rate may be different from one round to 
another. For example, the error rate may be about 1-2 mutations per polynucleotide in one 
round (e.g. a first round), and may be about 4-6 mutations per polynucleotide in another 
round (e.g. a second round). 

TABLE 3 



PCR conditions Inactivated clones [%] 





Buffer 


M fi Cl 2 


dNTPs 


TaqDNA 
polymersae 


MnCl 2 
OmM 


MnCl 2 
O.lmM 


MnCl 2 
0.15m 
M 


MnCl 2 
0.2mM 


MnCl 2 
0.4mM 


MnCl 2 
0.5mM 


A 


EP 


7mM 


EP 


Promega 


50u/ml 


60 

(173/288) 


69 

(199/288) 


77 

(223/288) 


76 

(220/288) 


90 

(258/288) 


94 

(270/28B) 


B 


EP 


7mM 


normal 


Promega 


50u/ml 


55 

(53/96) 


61 

(59/96) 




C 


normal 


2.5mM 


normal 


Perkin 
Elmer 


25u/ml 


3 

(3/96) 


10 

(10/96) 






5 

(14/288) 


9 

(27/288) 


10 

(29/288) 


11 
(31/288) 


28 

(81/288) 


31 

(90/288) 


D 


EP 


7mM 


EP 


Perkin 
Elmer 


25u/ml 


45 

(43/96) 


61 

(59/96) 




E 


EP 


7mM 


EP 


Perkin 
Elmer 


50u/ml 


39 

(37/96) 


52 
(50/96) 




F 


normal 


7mM 


EP 


Perkin 
Elmer 


25u/ml 


23 

(22/96) 


41 

(39/96) 




G 


normal 


7mM 


EP 


Promega 


50u/ml 


41 

(39/96) 


52 
(50/96) 




H 


EP 


7mM 


normal 


Promega 


50u/ml 


51 

(49/96) 


61 

(59/96) . 





Buffer EP : . (xlO) 500 mM KC1, 100 mM Tris-HCl (pH 8.3), 0. 1% (w/v) gelatin 

Buffer (normal) : (xlO) 500 mM KC1, 100 mM Tris-HCl (pH 8.3) 

dNTPs EP 0.2mM dGTP, 0.2 mM dATP, 1 mM dCTP, 1 mM dTTP 

dNTPs (normal) : 0.5M dGTP, 0.5 mM dATP, 0.5 mM dCTP, 0.5 mM dTTP 
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EXAMPLE4 
Production of Galactose Oxidase Mutants 

The directed evolution of galactose oxidase (GAO) is described GAO variants 
with increased activity toward allyl alcohol and D-galactose and increased thermostability 
relative to wild-type have been identified. 

A. Construction of GAO Mutant Libraries 

Plasmid pGAO-036, expressing wild-type GAO, was used as the parent for the 
directed evolution of GAO (FIG. 9). 

Two strategies have been followed for the directed evolution of the enzyme: (A) 
mutagenesis of the whole GAO gene (bases 1-1917) and (B) mutagenesis of part of the 
GAO gene (bases 518-1917). In Approach A, two rounds of error-prone PCR (45) have 
been performed (generations Al and A2), followed by one round of StEP recombination 
(generation A3) (139) of four improved variants identified in library A2. In Approach B, 
four rounds of error-prone PCR (45) have been performed (generations Bl through B4). E 
coli strain BL2 1 (DE3) (Novagen) was used for the expression of GAO. 
1. Approach A 

Error-prone PCR was carried out in a 1 00 }A reaction mixture containing about 0.3 
Mg plasmid DNA as template, 30 pmol of each primer, 0.2 mM dGTP, 0.2 mM dATP, 1 
mM dCTP, 1 mM dTTP, 7 mM MgCl 2 , 0.1 mM MnCl 2 , and 2.5 U Taq polymerase (Perkin 
Elmer) in 10 mM Tris-HCl, 50 mM KC1 buffer, pH 8.5. PCR conditions were as follows: 
30 cycles of 94 °C for 30 seconds, 50 °C for 30 seconds and 72 °C for 60 seconds. The 
percentage of inactive clones was between 30 and 50%. 

StEP recombination of the four improved variants identified in generation A2 was 
performed in a 100 iA reaction mixture containing about 0.3 mg (total) plasmid DNA as 
template (prepared by mixing equal amounts of all four plasmids), 10 pmol of each primer, 
0.5 mM of each dNTP, 2.5 mM MgCl 2 , and 5 U Taq polymerase (Perkin Elmer) in 10 mM 
Tris-HCl, 50 mM KC1 buffer, pH 8.5. PCR conditions were: 95 °C for 3 minuntes and 100 
cycles of 94 °C for 30 seconds and 58 °C for 10 seconds. The primers used for error-prone 
PCR and StEP were: 
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5'-AATTCGAAGCTTATGGCCTCAGCACCTATCGGAAGC-3' (forward) [SEQ. ID. 
NO. 1] 

and 5'-CTTCCTTCTAGATTACTGAGTAACGCGAATCGT-3' (reverse) [SEQ. ID. NO. 
2]. 

Z Approach B 

Error-prone PCR was carried out in a 100 fA reaction mixture containing 10 ng 
plasmid DNA as template, 50 pmol of each primer, 0.2 mM of each dNTP, 7 mM 
(generations Bl and B2) or 4 mM MgCl 2 (generations B3 and B4), and 5 U Taq 
polymerase (Boehringer Mannheim) in 10 mM Tris-HCl, 50 mM KC1 buffer, pH 8.3. PCR 
conditions were as follows: 94 °C for 2 minutes and 25 cycles of 94 °C for 30 seconds, 58 
°C for 30 seconds and 72 °C for 60 seconds. The primers used were: 
5-TTGTTCCTGCGGCTGCAGCAATTGAACCG-3 ' (forward) [SEQ. ID. NO. 8] and 
S-TGCCGGTCGACTCTAGATTACTGAGTAACG-S' (reverse) [SEQ. ID. NO. 9]. 
The percentage of inactive clones was between 30 and 40%. 

B. Screening of GAP Libraries 

GAO activity was screened in 96-well plates, using the methods of Approaches A 
and B, respectively, as described in Example 1(D). 

C. Laboratory Evolution of GAO 

The thermal stability curves of selected GAO variants are shown in FIG. 16. 
Variants were grown in test tubes (3 ml cultures). Following centrifugation and 
resuspension of the cell pellets in NaPi buffer, pH 7.0 containing CuS0 4 , the cells were 
lysed. Aliquots of the cell extracts were heated at each temperature for 10 min and then 
cooled down on ice for 10 min before the residual activity toward D-galactose was 
determined at room temperature. 

Results of the laboratory evolution of GAO to increase activity and thermostability 
are listed in TABLE 4. T 50 is an operational measure of stability and is defined as the 
temperature at which the enzyme loses 50% of its activity following incubation for a set 
time. 
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Wild type GAO (pGAO-036) was used as the parent for generation Al of GAO variants. 
After screening about 1500 clones, three mutants, 9.16.8D2, 9.16.6C11 and 9.16.16D12, 
were identified as more active toward allyl alcohol and/or galactose. Clone 9.16.16D12, 
which was also more thermostable than wild-type GAO, was used to parent generation A2 
of GAO variants. Four improved mutants were identified in this library following 
screening of about 1500 clones: 11.03.6D3, 11.03.10C3, 11.03. 10D6 and 11.03.13E12. 
These clones were more active than the parent toward allyl alcohol and galactose. Clone 
1 1 .03 . 1 0C3 was substantially more thermostable than the parent, as well. These four 
improved variants were recombined by StEP in generation A3. Screening of about 2000 
clones led to the identification of variant 1 .06.20E7 which shows about a 200-fold 
increased activity toward allyl alcohol and D-galactose and exhibits about a 12 °C higher 
T 50 with respect to wild-type GAO. 

Wild-type GAO (pGAO-036) was used as the parent for generation Bl of GAO 
variants. After screening about 900 clones, variant 1.D4 was identified as more active 
toward galactose and used to parent generation B2. Mutant 2.G4 was identified as more 
active toward galactose in this library following screening of about 1500 clones. Library 
B3 of GAO variants was generated using 2.G4 as the parent, and clone 3.H7 was identified 
as an improved variant after screening about 1500 clones. Finally, library 4B was created 
using 3.H7 as the parent and about 1500 clones were screened. Variant 4.F12 was 
identified as about 15-fold more active toward galactose relative to wild-type GAO. 

D. Active and Thermostable Mutations 

Most beneficial mutations occur in domains II and IH of the GAO gene (residues 
156-532 and 533-639, respectively) (87). Mutation V494A, which was identified several 
times in the screen, is located at the bottom of the active site adjacent to the copper ligand 
Y495. Its presence increases the binding affinity for galactose approximately 3-fold. 
N535D is found in a solvent-exposed loop in domain in. The amino acid substitution 
G195E is largely responsible for the observed increase in thermostability of variant 
1.06.20E7 relative to wild-type. See FIG. 16 and TABLE 4. 

It should also be noted that a large number of mutations (five in these experiments) 
resulted from the substitution of a neutral residue by a negatively charged residue. This 
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tends to decrease the isoelectric point of GAO in the mutants (the pi of wild type GAO is 
12). A decrease in pi is advantageous, in that it may lead to fewer interactions between the 
mutant GAO and other macromolecules, and lower adhesion to glass. It may also permit 
increased use of crude galactose oxidase preparations in organic synthesis (107). 



TABLE 4 

Mutations identified in GAO variants and their effects on GAO properties. 



GEN 


GAO name 


nucleotide base 
substitution 


amino acid 
substitution . 


relative 

activity for allyl 
alcohol* 


relative 
activity for D- 
galactose 


T 50 

CQ 


0 


pGAO-036 


N/A(WT) 


N/A(WT) 


1.0 


1.0 


42 


Al 


9.16.8D2 


A1609G 


N537D 


2.6 


4.6 




Al 


9.16.6C11 


T1481C 
T1543A 


V494A 
C515S 


2.8 


1.3 




Al 


9.16.16D12 


T1481C 
T408C 


V494A 
P136 


3.0 


4.9 


44 


A2 


11.03.6D3 


T1481C 

T408C 

T28C 


V494A 

P136 

S10P 


6.4 


11 




A2 


11.03.10C3', 


T1481C 

T408C 

G584A 


V494A 

P136 

G195E 


3.8 


9.6 


54 


A2 


11.03.10D6 


T1481C 

T408C 

A936G 

A1603G 

T654C 


V494A 
P136 
L312 
N535D 
12 18 


5.4 


11 




A2 


11.03.13E12 


T1481C 

T408C 

A208G 


V494A 

P136 

M70V 


5.1 






A3 


1.06.20E7 


T1481C 

T28C 

T408C 

A208G 

G584A 

A1603G 


V494A 

S10P 

P136 

M70V 

G195E 

N535D 


20 


55 


54 


Bl 


1.D4 


A1237G 


N413D 




2.4 




B2 


2.G4 


A1237G 
T1650A 


N413D 
S550 




4.0 




B3 


3.H7 


A1237G 
T1650A 
T1481C 


N413D 

S550 

V494A 




8.6 




B4 


4.F12 


A1237G 
T1650A 
T1481C 
T1830A 


N413D 
S550 
V494A 
S610 




15.2 





♦Allyl alcohol is oxidized by wild-type GAO at ca. 3% the rate of galactose oxidation. 
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Mutations identified at residues A3, L312, T218, P136, S550 and S610 are 
synonymous and, without being bound by theory, the observed increase in activity is 
probably due to higher expression of GAO in E. coli. Given the low expression level of 
recombinant wild-type GAO (less than 3% of total intracellular protein as determined by 
SDS-PAGE), this is a much needed improvement. 

The variants identified also exhibit increased activity toward a variety of GAO 
substrates. Mutant 1.06.20E7 is about 200-fold more active toward 3-pyridylcarbinol and 
mutant 4.F12 is about 15-fold more active toward glycerol, xylitol, beta-D-lactose, and 
IPTG. 

The sequences of representative mutants of the invention identified in TABLE 4 
are shown in BIGS, 17-28. 

As shown in the above Examples, the galactose oxidase gene can be expressed in 
E. coli in relatively high yield, with an increased activity toward at least one substrate. In 
certain embodiments the activity is greatly increased toward several substrates. In certain 
embodiments the mutants exhibit thermostability. 

The inducible promoters P/ac or Vtac were effective for expression of the galactose 
oxidase gene and are preferred. Much higher expression may be possible when other 
strong promoters are used. However, some strong promoters may be counterproductive. 
For example, E. coli did not grow well when 77 promoter, which is stronger than lac 
promoter, was used for expression of the galactose oxidase gene. Double promoters of two 
?lac-Plac or Ylac-Vtac were selected to express the galactose oxidase gene. Double 
promoters express the gene stronger than single promoter as compared pGAO-025 and 
pGAO-01 1. Triple promoters expressed the gene as well as double promoters. Upper 
promoter of double promoters seemed to be less effective than lower promoter in the 
Examples. Therefore, double promoters of P/ac-P/ac or ?lac-?tac are preferred. 
Induction of gene by IPTG was necessary when lac promoter or tac promoter was used. 
Timing of induction and incubation time after that were optimized. 

In these experiments the fused form of GAO (i.e. as a fusion protein with lacZ) 
was not found to provide advantages, and was not necessary to express the fungal gene. 

Galactose oxidase generally had reduced activity or lost its activity when codons 
were alternated or when it was produced as fused enzyme with His-tag. Culture condition 
was also important for production of the enzyme. 
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Galactose oxidase was engineered by directed evolution to produce more active 
variants toward natural and additional substrates. Activity of the present mutants was as 
high as about 65 times that of wild-type GAO. Mutants of the invention also are more 
stable than wild-type, and in particular exhibit improved thermal stability. 
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CLAIMS 

What is claimed is: 



1 1 . A method of obtaining and'impro ving the production of a functional galactose oxidase 

2 polypeptide by a host cell comprising the steps of: 

3 (a) providing at least one parent galactose oxidase polynucleotide encoding a 

4 parent galactose oxidase polypeptide, 

5 (b) altering the nucleotide sequence of the parent polynucleotide by random 

6 mutagenesis to produce a population of mutant polypeptides; 

7 (c) transforming host cells to express the mutant polypeptides; 

8 (d) screening for first-generation functional mutants produced by the host cells 

9 and having at least one modified property; 

1 0 (e) selecting at least one polynucleotide encoding a first-generation mutant as a 

1 1 parent polynucleotide; and 

12 (f) repeating a round of altering, transforming and screening steps at least once 

13 to obtain at least one other generation of one or more mutants. 

1 2. The method of claim 1 wherein the method of random mutagenesis comprises an 

2 error-prone polymerase chain reaction. 

1 3. The method of claim 2, wherein the error-prone polymerase chain reaction employs 

2 unbalanced nucleotide concentrations. 

1 4. The method of claim 2, wherein the error-prone polymerase chain reaction employs 

2 manganese ions in a concentration of about 0 to about 500 yM. 

1 5. The method of claim 2, wherein the error-prone polymerase chain reaction employs 

2 manganese ions in a concentration of about 100 /jM. 

1 6. The method of claim 2, wherein the polymerase chain reaction generates an error rate 

2 of about 1 -2 mutations per polynucleotide. 
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1 7. The method of claim 2, wherein the polymerase chain reaction generates an error rate 

2 of up to about six mutations per polynucleotide. 

1 8, The method of claim 1, wherein at least one of the altering, transforming and 

2 screening steps are changed in at least one repeated round. 

1 9. The method of claim 1 , wherein the conditions for random mutagenesis in at least one 

2 repeated round of altering, transforming and screening are different from the 

3 conditions in any other round of altering, transforming and screening. 

1 10. The method of claim 1, wherein the host cells in at least one repeated round of 

2 altering, transforming and screening are different from the host cells in any other 

3 round of altering, transforming and screening. 

1 11. The method of claim 10, whereinthe host cells in at least one round are bacterial cells. 

1 12. The method of claim 1 1, wherein the bacterial cells are E. coli cells. 

1 13. The method of claim 1, wherein at least one round of altering, transforming and 

2 screening comprises screening for a property of the polypeptide that was not screened 

3 for in another round of altering, transforming and screening. 

1 14. The method of claim 13, wherein at least one property is selected from the group 

2 consisting of enzyme activity, en2yme selectivity, enzyme stability, and enzyme yield. 

1 15. The method of claim 1 , wherein each screening step comprises screening for one or 

2 more of the biological activity of the polypeptide, the selectively of the polypeptide, 

3 the stability of the polypeptide, and the yield of expressed polypeptide. 
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1 16. The method of claim 2, wherein the eiror rate in the altering step of at least one round 

2 of altering, transforming and screening is about 1 -2 mutations per polynucleotide, and 

3 the error rate in the altering step of at least one other round is about 4-6 mutations per 

4 polynucleotide. 

^ 

1 1 7. The method of claim 2, wherein the polymerase chain reaction employs manganese 

2 ions in a concentration of about 0.35 mM. 

1 18. The method of claim 1, wherein screening comprises pre-screening for mutant 

2 colonies using nitrocellulose membranes. 

1 19. A polynucleotide evolved according to the method of claim 8 

1 20. A polynucleotide encoding for a galactose oxidase which has amutation in at least one 

2 amino acid selected from the group consisting of A3, S10, M70, P136, G195, T218, 

3 L312, N413, V494, C515, N535, N537, S550, and S610. 

1 21. A polynucleotide encoding for a galactose oxidase which has at least one amino acid 

2 mutation selected from the group consisting of S 1 OP, M70V, Gl 95E,N41 3D, V494A, 

3 C515S, N535D, and N537D. 

1 22. A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 

2 N537D. 

1 23 . A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 

2 V494A. 

1 24. The polynucleotide of claim 23, further comprising the amino acid mutation C5 1 5S. 

1 25. The polynucleotide of claim 23, further comprising the amino acid mutation S10P. 
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26. The polynucleotide of claim 23, further comprising a silent mutation at P 1 36. 

27. The polynucleotide of claim 25, further comprising a silent mutation at P 1 36. 

28. The polynucleotide of claim 23, further comprising the amino acid mutation Gl 95E. 

29. The polynucleotide of claim 28, further comprising a silent mutation in at least one of 
A3andP136. 

30. The polynucleotide of claim 23, further comprising the amino acid mutation N535D. 

3 1 . The polynucleotide of claim 3 0, further comprising a silent mutation in at least one of 
P136,L312,andT218. 

32. The polynucleotide of claim 23, further comprising the amino acid mutation M70V. 

33. The polynucleotide of claim 32, further comprising a silent mutation at P136. 

34. A polynucleotide encoding for a galactose oxidase which has the amino acid 
mutations V494A, SI OP, M70V, G195E andN535D. 

35. The polynucleotide of claim 34, further comprising a silent mutation at P136. 

36. A polynucleotide encoding for a galactose oxidase which has the amino acid mutation 
N413D. 

37. The polynucleotide of claim 36, further comprising a silent mutation at S550. 

38. The polynucleotide of claim 23, further comprising the amino acid mutation N413D. 
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1 39. The polynucleotide of claim 3 8, further comprising a silent mutation in at least one of 

2 S550andS610. 

1 40. A polynucleotide encoding for a galactose oxidase which has a nucleotide mutation 

2 in at least one of positions 9, 28, 208, 408, 584, 654, 830, 936, 1237, 1481, 1543, 

3 1603, 1609, 1650, and 1830. 

1 41 , The polynucleotide of claim 40, wherein the mutation at any of positions 9, 408, 654, 

2 936, 1650 and 1 830 is a silent mutation. 

1 42 The polynucleotide of claim 40 which has a mutation in at least one of nucleotide 

2 positions 28, 408, 654, and 1 48 1 , wherein a thymine is replaiced by a cytosine. 

1 43. The polynucleotide of claim 40, which has a mutation in. at least one of nucleotide 

2 positions 1543, 1650 and 1 830, wherein a thymine is replaced by an adenine. 

1 44. The polynucleotide of claim 40, which has a mutation in at least one of nucleotide 

2 positions 206, 936, 1237, 1603, and 1609, wherein adenine is replaced by guanine. 

1 45. A polynucleotide encoding for a galactose oxidase which has at least one nucleotide 

2 mutation in a region encompassed by nucleotides selected from the group consisting 

3 of: 

4 (a) 1 through 30; 

5 (b) 200 through 700; 

6 (c) 800 through 1000; and 

7 (d) 1200 through 1650. 

1 46. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 encompassed by nucleotides 1 -3 0, wherein a thymine is replaced by a cytosine. 
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1 47. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 encompassed by nucleotides 1450-1550, wherein a thymine is replaced by one of a 

3 cytosine and an adenine. 

1 48. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 encompassed by nucleotides 1200-1250, wherein an adenine is replaced by a guanine. 

1 49. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 encompassed by nucleotides 1 600- 1 650, wherein an adenine is replaced by a guanine. 

1 50. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 proximate to and encompassing nucleotide 208, wherein an adenine is replaced by a 

3 ' guanine. 

1 51. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 proximate to and encompassing nucleotide 585, wherein a guanine is replaced by an 

3 adenine. 

1 52. The polynucleotide of claim 45, which has a nucleotide mutation in a region 

2 proximate to and encompassing nucleotide 1 543 , wherein a thymine is replaced by an 

3 adenine. 

1 53. A polynucleotide encoding for a galactose oxidase which has at least one of the 

2 nucleotide mutations A9C, T28C, A208G, T408C, G584A, T654C, A936G, A1237G, 

3 T1481C, T1543A, A1603G, A1609G, T1650A, and T1830A. 

1 54. The polynucleotide of claim 53, which has the nucleotide mutation T148 1 C. 

1 55. The polynucleotide of claim 54, further comprising the nucleotide mutation T1543A. 

1 56. The polynucleotide of claim 54, further comprising the nucleotide mutation T408C. 
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1 57. The polynucleotide of claim 56, further comprising a nucleotide mutation selected 

2 from the group consisting of G584A, A1603G, and A208G. 



1 58. The polynucleotide of claim 56, further comprising at least one of the nucleotide 

2 mutations A9C,A936G, and T654C. 

1 59. The polynucleotide of claim 56, further comprising the nucleotide mutations T28C, 

2 A208G,G584AandA1603G. 

1 60. The polynucleotide of claim 53 which has the nucleotide mutation A1237G. 

1 61. The polynucleotide of claim 60, further comprising at least one of the nucleotide 

2 mutations selected from the group consisting of Tl 650A, Tl 830A, and T1481 C. 

1 62. The polynucleotide of claim 61, having the nucleotide mutations A1237G, T1650A, 

2 T1481CandT1830A. 

1 63 . A galactose oxidase which has a mutation in at least one amino acid selected from the 

2 group consisting of A3, S10, M70, P136, G195, T218, L312, N413, V494, C515, 

3 N535, N537, S550, and S610. 

1 64. A galactose oxidase which has at least one of the amino acid mutations S 1 OP, M70 V, 

2 G195E, N413D, V494A, C515S, N535D, and N537D. 

1 65. The galactose oxidase of claim 64, which has the amino acid mutation N537D. 

1 66. The galactose oxidase of claim 64, which has the amino acid mutation V494A. 

1 67. The galactose oxidase of claim 66, further comprising the amino acid mutation 

2 C515S. 
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68. The galactose oxidase claim 66, further comprising the amino acid mutation SI OP. 

69. The galactose oxidase of claim 66, further comprising a silent mutation at PI 36. 

70. The galactose oxidase of claim 68, further comprising a silent mutation at PI 36. 

71. The galactose oxidase of claim 66, further comprising the amino acid mutation 
G195E. 

72. The galactose oxidase of claim 71, further comprising a silent mutation in at least one 
ofA3andP136. 

73. The galactose oxidase of claim 66, further comprising the amino acid mutation 
N535D. 

74. The galactose oxidase of claim 73, further comprising a silent mutation in at least one 
' ofP136,L312,andT218. 

75 . The galactose oxidase of claim 66, further comprising the amino acid mutation M70V. 

76. The galactose oxidase of claim 75, further comprising a silent mutation at P136. 

77. The galactose oxidase of claim 64, which has the amino acid mutations S 1 OP, M70 V, 
G195E, V494AandN535D. 

78. The galactose oxidase of claim 77, further comprising a silent mutation at P136. 

79. The galactose oxidase of claim 64, which has the amino acid mutation N413D. 

80. The galactose oxidase of claim 80, further comprising a silent mutation at S550. 
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1 81. The galactose oxidase of claim 66, further comprising the amino acid mutation 

2 N413D. 

1 82. The galactose oxidase of claim 8 1 , further comprising a silent mutation in at least one 

2 ofS550andS610. 
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AMENDED CLAIMS 

[received by the International Bureau on 15 October 2001 (15.10.01); 
original claims 1-82 replaced by new claims 1-130 (14 pages)] 

1 1 . A polynucleotide evolved according to a method, wherein the method comprises 

2 the steps of: 

3 (i) providing at least one parent galactose oxidase polynucleotide encoding 

4 a parent galactose oxidase polypeptide, 

5 (ii) altering the nucleotide sequence of the parent polynucleotide by random 

6 mutagenesis to produce a population of mutant polypeptides; 

7 (iii) transforming host cells to express the mutant polypeptides; 

8 (iv) screening for first-generation functional mutants produced by the host 

9 cells and having at least one modified property; 

1 0 (v) selecting at least one polynucleotide encoding a first-generation mutant 

11 as a parent polynucleotide; and 

1 2 (vi) repeating a round of altering, transforming and screening steps at least 

1 3 once to obtain at least one other generation of one or more mutants, 

1 4 wherein at least one of the altering, transforming and screening steps are changed 

15 in at least one repeated round. 

1 2. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in at least one amino acid selected from the group consisting of 

3 A3,M70,P136,G195,T218 > L312,N413,V494,C515,N535,N537,S550,:Sip 5 

4 S610. 

1 3 . A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in at least one amino acid selected from the group consisting of 

3 M70V, G195E, N413D, V494A, C515S, N535D, N537D, and S10P. 

1 4. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has the amino acid mutation N537D. 
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1 5 . A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has the amino acid mutation V494A. 

1 6. The polynucleotide of claim 5, wherein the galactose oxidase further comprises 

2 the amino acid mutation C515S. 

1 7. The polynucleotide of claim 5, wherein the galactose oxidase further composes 

2 the amino acid mutation S 1 OP. 

1 8. The polynucleotide of claim 5, wherein the galactose oxidase further comprises 

2 a silent mutation at P136. 

1 9. The polynucleotide of claim 7, wherein the galactose oxidase further comprises 

2 a silent mutation at P136. 

1 10. The polynucleotide of claim 5, wherein the galactose oxidase further comprises 

2 the amino acid mutation Gl 95E. 

1 11. The polynucleotide of claim 1 0, wherein the galactose oxidase further comprises 

2 a silent mutation in at least one of A3 and P 1 3 6. 

1 12. The polynucleotide of claim 5, wherein the galactose oxidase further comprises 

2 the amino acid mutation N535D. 

1 13. The polynucleotide of claim 1 2, wherein the galactose oxidase further comprises 

2 a silent mutation in at least one of P136, L312, and T218. 

1 14. The polynucleotide of claim 5, wherein the galactose oxidase further comprising 

2 the amino acid mutation M70V. 
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1 15. The polynucleotide of claim 14, wherein the galactose oxidase further comprises 

2 a silent mutation at PI 36. 

1 16. Apolynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has the amino acid mutations V494A, S10P, M70V, G195E and N535D. 

1 17. The polynucleotide of claim 16, wherein the galactose oxidase further comprises 

2 a silent mutation at P 1 3 6. 

1 18. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 1 

2 has the amino acid mutation N41 3D. 

1 19. The polynucleotide of claim 18, wherein the galactose oxidase further comprises 

2 a silent mutation at S550. 

1 20. The polynucleotide of claim 5, wherein the galactose oxidase further comprises 

2 the amino acid mutation N41 3D. 

1 21 . The polynucleotide of claim 20, wherein the galactose oxidase further comprises 

2 a silent mutation in at least one of S550 and S610. 

1 22. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a nucleotide mutation in at least one position selected from the group 

3 consisting of 9, 28, 208, 408, 584, 654, 830, 936, 1237, 1481, 1543, 1603, 1609, 

4 1650, and 1830. 

1 23. The polynucleotide of claim 22, wherein the mutation at any of positions 9, 408, 

2 654, 936, 1650 and 1830 is a silent mutation. 
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1 24. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a nucleotide mutation in at least one position of a region encompassed by 

3 nucleotides 1 through 30. 

1 25. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a nucleotide mutation in at least one position of a region encompassed by 

3 nucleotides 200 through 700. 

1 26. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a nucleotide mutation in at least one position of a region encompassed by 

3 nucleotides 800 through 1000. 

1 27. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a nucleotide mutation in at least one position of a region encompassed by* 

3 nucleotides 1200 through 1650. 

1 28. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region encompassed by 

3 nucleotides 1 -30, wherein a thymine is replaced by a cytosine. 

1 29. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region encompassed by 

3 nucleotides 1450-1550, wherein thymine is replaced by one of cytosine and 

4 adenine. 

1 30. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region encompassed by 

3 nucleotides 1200-1250, wherein adenine is replaced by guanine. 
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1 31. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region encompassed by 

3 nucleotides 1600-1650, wherein adenine is replaced by guanine. 

1 32. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within' a region proximate to and 

3 encompassing nucleotide 208, wherein adenine is replaced by guanine. 

1 33. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region proximate to and 

3 encompassing nucleotide 584, wherein guanine is replaced by adenine. 

1 34. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one position within a region proximate to and 

3 encompassing nucleotide 1 543, wherein thymine is replaced by adenine. 

1 35. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one of nucleotide positions 28, 408, 654, and 1481, 

3 wherein thymine is replaced by cytosine. 

1 36. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one of nucleotide positions 1543, 1650 and 1830, 

3 wherein thymine is replaced by adenine. 

1 37. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has a mutation in at least one of nucleotide positions 208, 936, 1237, 1603, and 

3 1609, wherein adenine is replaced by guanine. 

1 38. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has at least one of the nucleotide mutations A1609G, T1481C, T1543A, T408C, 
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3 T28C, G584A, A9C, A936G, A1603G, T654C, A208G, A1237G, T1650A, and 

4 T1830A. 

1 39. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 , has the nucleotide mutation Al 609G. 

1 40. A polynucleotide encoding for a galactose oxidase, wherein the polynucleotide 

2 has the nucleotide mutation T1481C. 

1 41. The polynucleotide of claim 40, further comprising the nucleotide mutation 

2 T1543A. 

1 42. The polynucleotide of claim 40, further comprising the nucleotide mutation 

2 T408C, 

1 43. The polynucleotide of claim 42, further comprising the nucleotide mutation 

2 G584A. 

1 44. The polynucleotide of claim 42, further comprising the nucleotide mutation 

2 A1603G. 

1 45. The polynucleotide of claim 42, further comprising the nucleotide mutation 

2 A208G. 

1 46. The polynucleotide of claim 42, further comprising at least one of the nucleotide . 

2 mutations A9, A936G, and T654C. 

1 47. The polynucleotide of claim 42, further comprising the nucleotide mutations 

2 T28C, A208G, G584A and A1603G. 
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1 48 . A polynucleotide encoding a galactose oxidase, wherein the polynucleotide has 

2 the nucleotide mutation A1237G. 

1 49. The polynucleotide of claim 48, further comprising at least one of the nucleotide 

2 mutations Tl 650A, and Tl 830A. 

1 50. The polynucleotide of claim 48, further comprising the nucleotide mutation 

2 T1481C. 

1 51. A polynucleotide encoding a galactose oxidase, wherein the polynucleotide has 

2 the nucleotide mutations A1237G, T1650A, T1481C and T1830A 

1 52. A polynucleotide encoding for a galactose oxidase which has a mutation in at 

2 least one amino acid selected from the group consisting of A3, S10, M70, P136, 

3 T218, L312, N413, C515, N535, N537, S550, and S610. 

1 53. The polynucleotide encoding for a galactose oxidase of claim 52, wherein the 

2 galactose oxidase further comprises at least one amino acid mutation selected 

3 from the group consisting of Gl 95 and V494. 

1 54. The polynucleotide of claim 52, wherein the galactose oxidase has at least a 

2 mutation selected from the group consisting of S10P, M70V, N413D, C515S, 

3 N535D,andN537D. 

1 55. The polynucleotide of claim 54, wherein the galactose oxidase further comprises 

2 at least one amino acid mutation selected from the group consisting of Gl 95E and 

3 V494A. 

1 56. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in amino acid N537. 
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1 57. A polynucleotide encoding for a galactose oxidase,, wherein the galactose oxidase ! 

2 has a mutation in amino acid N537. 

1 58. The polynucleotide of claim 57, wherein the mutation is N537D. 

1 59. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494 and C5 15. 

1 60. The polynucleotide of claim 59, wherein the mutations are V494A and C515S. 

1 61. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494 and P136. 

1 62. The polynucleotide of claim 61 , wherein the V494 mutation is V494A. 

1 63 . A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494, P 1 3 6, and S 1 0. 

1 64. The polynucleotide of claim 63, wherein the V494 mutation is V494A, and the 

2 S10 mutation is S10P. 

1 65. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494, P136, G195, and A3. 

1 66. The polynucleotide of claim 65, wherein the V494 mutation is V494A, and the 

2 G195 mutation is G195E. 

1 67. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494, P136, L312, N535, and T218. 
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1 68. The polynucleotide of claim 67, wherein the V494 mutation is V494A, and the 

2 N535 mutation is N535D. 

1 69. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494, P 1 36, and M70. 

1 70. The polynucleotide of claim 69, wherein the V494 mutation is V494A, and the 

2 M70 mutation is M70V. 

1 71 . A polynucleotide encoding for agalactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids V494, S10, P136, M70, G195, andN535. 

1 72. The polynucleotide of claim 71, wherein the V494 mutation is V494A, the S10 

2 mutation is S10P, the M70 mutation is M70V, the G195 mutation is G195E, and 

3 the N535 mutation is N535D. 

1 73. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in amino acid N4 1 3 . 

1 74. The polynucleotide of claim 73, wherein the mutation is N413D. 

1 75 . A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in amino acids N41 3 and S550. 

1 76. The polynucleotide of claim 75, wherein the N413 mutation is N413D. 

1 77. A polynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has a mutation in amino acids N41 3, S550, and V494. 
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1 78. The polynucleotide of claim 77, wherein the N413 mutation is N413D, and the 

2 V494 mutation is V494A. 

1 79. Apolynucleotide encoding for a galactose oxidase, wherein the galactose oxidase 

2 has mutations in amino acids N413, S550, V494, and S610. 

1 80. The polynucleotide of claim 79, wherein the N413 mutation is N413D, and the 

2 V494 mutation is V494A. 

1 81. A polynucleotide having a sequence selected from the group consisting of SEQ 

2 . IDNOS:37-48. 

1 82. A galactose oxidase which has amutation in at least one amino acid selected from 

2 the group consisting of A3, S10, M70, P136, G195, T218, L312,N413, V494, 

3 C515, N535, N537, S550, and S610. 

1 83. A galactose oxidase which has at least one of the amino acid mutations S10P, 

2 M70V, G195E,N413D, V494A, C515S, N535D, andN537D. 

1 84. The galactose oxidase of claim 83, which has the amino acid mutation N537D. 

1 85. The galactose oxidase of claim 83, which has the amino acid mutation V494A. 

1 86. The galactose oxidase of claim 85, further comprising the amino acid mutation 

2 C515S. 

1 87. The galactose oxidase claim 85, further comprising the amino acid mutation 

2 S10P. 

1 88. The galactose oxidase of claim 85, further comprising a silent mutation at P136. 
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1 89. The galactose oxidase of claim 87, further comprising a silent mutation at PI 3 6. 

1 90. The galactose oxidase of claim 85, further comprising the amino acid mutation 

2 G195E. 

1 91. The galactose oxidase of claim 90, further comprising a silent mutation in at least 

2 one of A3 andP136. 

1 92. The galactose oxidase of claim 85, further comprising the amino acid mutation 

2 N535D. 

1 93 . The galactose oxidase of claim 92, further comprising a silent mutation in at least 

2 oneofP136,L312,andT218. 

1 94. The galactose oxidase of claim 85, further comprising the amino acid mutation 

2 M70V. 

1 95. The galactose oxidase of claim 94, further comprising a silent mutation at PI 36. 

1 96. The galactose oxidase of claim 83, which has the amino acid mutations SI OP, 

2 M70V, G195E, V494A and N535D. 

1 97. The galactose oxidase of claim 96, further comprising a silent mutation at P 1 36. 

1 98. The galactose oxidase of claim 83, which has the amino acid mutation N413D. 

1 99. The galactose oxidase of claim 98, further comprising a silent mutation at S550. 
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1 1 00. The galactose oxidase of claim 85, further comprising the amino acid mutation 

2 N413D. 

1 101. The galactose oxidase of claim 100, further comprising a silent mutation in at 

2 least one of S550 and S610. 

1 1 02. A galactose oxidase whichhas amutation in at least one amino acid selected from 

2 the group consisting of A3, S10, M70, P136, T218, L312, N413, C515, N535, 

3 N537,S550,andS610. 

1 103. The galactose oxidase of claim 102, further comprising at least one amino acid 

2 mutation selected from the group consisting of G195 and V494. 

1 1 04. The galactose oxidase of claim 1 02, wherein the mutation is selected from the 

2 group consisting of S10P, M70V, N413D, C5 1 5S, N535D, and N537D. 

1 105. The galactose oxidase of claim 104, further comprising at least one amino acid 

2 mutation selected from the group consisting of Gl 95E and V494A. 

1 1 06. A galactose oxidase which has a mutation in amino acid N537. 

1 107. The galactose oxidase of claim 1 06, wherein the mutation is N537D. 

1 108. A galactose oxidase which has mutations in amino acids V49^.and C5 1 5. 

1 109. The galactose oxidase of claim 108, wherein the mutations* !are V494A and 

2 C515S. 

1 110. A galactose oxidase which has mutations in amino acids V494 and P 1 36. 
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1 111. The galactose oxidase of claim 1 1 0, wherein the V494 mutation is V494A, 

1 112. A galactose oxidase which has mutations in amino acids V494, P136, and S 1 0. 

1 113. The galactose oxidase of claim 1 12, wherein the V494 mutation is V494 A, and 

2 the S10 mutation is S10P. 

1 1 14. A galactose oxidase whichhas mutations in amino acids V494, P136, G195, and 

2 A3. 

1 115. The galactose oxidase of claim 1 14, wherein the V494 mutation is V494A, and 

2 the Gl 95 mutation is Gl 95E. 

1 116. A galactose oxidase which has mutations in amino acids V494, PI 36, L312, 

2 N535,andT218. 

1 117. The galactose oxidase of claim 116, wherein the V494 mutation is V494A, and 

2 the N535 mutation is N535D. 

1 118. A galactose oxidase which has mutations in amino acids V494, P 1 36, and M70. 

1 119. The galactose oxidase of claim 1 1 8, wherein the V494 mutation is V494A, and 

2 the M70 mutation is M70V. 

1 1 20. A galactose oxidase which has mutations in amino acids V494, S 10, P 1 36, M70, 

2 G195,andN535. 

1 121. The galactose oxidase of claim 120, wherein the V494 mutation is V494A, the 

2 SlOmutationis S 1 OP, the M70 mutation is M70V, the G195 mutationis G195E, 

3 and the N535 mutation is N535D. 



AMENDED SHEET (ARTICLE 19) 



WO 01/88110 



81 



PCT/US00/32345 



1 122. A galactose oxidase which has a mutation in amino acid N413. 

1 123. The galactose oxidase of claim 122, wherein the mutation is N41 3D. 

1 124. A galactose oxidase which has a mutation in amino acids N413 and S550. 

1 125. The galactose oxidase of claim 124, wherein the N413 mutation is N413D. 

1 1 26 . A galactose oxidase which has a mutation in amino acids N4 1 3 , S550, and V494. 

1 127. The galactose oxidase of claim 126, wherein the N413 mutation is N413D, and 

2 the V494 mutation is V494A. 

1 128. A galactose oxidase which has mutations in amino acids N41 3, S550, V494, and 

2 S610. 

1 129. The galactose oxidase of claim 128, wherein the N413 mutation is N413D, and 

2 the V494 mutation is V494A. 

1 130. A galactose oxidase having an amino acid sequence selected from the group 

2 consisting of SEQ ID NOS: 10-21. 
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FIG. 4 
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FIG. 6 



PCR primers Sequence 



MY 






001 


5'-AAT TCG AAG CTT ATG GCC TCA GC A CCT ATC GGA AGC-3' 


SEQ. ID NO. ! 


002 


.V-CTT CCT TCT AGA TTA CTG AGT AAC GCG AAT CGT-.V 


SEQ. ID NO. 2 


003 


5'-GGA AGA GAA TTC AAT ACG CAA ACC GCC TCT-3' 


SEQ. ID NO. 3 


004 


.V-GGT CAT AAG CTT TTC CTG TGT GAA ATT GTT AT-3' 


SEQ. ID NO. 4 


005 


5'-ACC ATG ATT TCG ACG TCG GTA CCC TCA GCA-3' 


SEQ. ID NO. 5 


009 


5'-CTT CCT AAG CTT TCA CTG AGT AAC GCG AAT-3 1 


SEQ. ID NO. 6 


036 


5'-GGA AGA GGT ACC AAT ACG CAA ACC GCC TCT-3' 


SEQ. ID NO. 7 
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FIG. 7 
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&>cl (U) 
Kpnl 



Xbal 



. AAGCTT . . . 
. TTCGAA . . . 

,A AGCTT. 
, . TTCGA A. 

. . AAGCT AGCTT. 
. .TTCGA TCGAA. 

. . AAGCTAGCTT . . 
. . TTCGA TCGAA . . 
(SEQ. ID NO.: 23) 




\ 

cut (PsiT) 

\ 

\\ blunting (T4 DNA polymerase) 

■(SEQ. ID NO.: 22)^ 

ligation (T4 DNA ligase) 



EcoKI 

. GAATTC . . . 
. CTTAAG . . . 

. G AATTC . . - 

. CTTAA G. 

.GAATT AATTC. . . 
.CTTAA TTAAG. . . 

. GAATTAATTC . . . (SEQ. ID NO.: 24) 
• CTTAATTAAG . . . ($£ q ro N0<: 2 5) 




Pst\ 
. CTGCAG . . . 
. GACGTC . . . 

.CTGCA G. 
,G ACGTC. 

,C G. . . 
. G C. . . 

.CG. . . 
.GC. . • 



?lac (lacZ) 

pUG18-EHL 
2694 bp 

4> 
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P-MY001 




P-MY001 AAT TCG AAG CTT ATG W ~ ^A GCA CCT ATC GGA AGC 

P-MY0O2 CTT CCT TCT AG A TTA CTG AGT AAC GCG AAT CGT 

P-MY003 GGA AG A GAA TTC AAT ACG CAA ACC GCC TCT 

P-MY004 GGT CAT AAG CTT TTC CTG TGT GAA ATT GTT AT 

/f«tUH 

,/i/wni (is) 
^v-, -*Hr P-MY004 
P-MY003 * r 





♦ 



^ Hindi)! (1536) 

PCR (Primer : P-MY001, P-MY002) 

cut (Hindlll.XbaV) - v 

1 « X X T- I 

^ cut (Hmdm,A2xzt) cut ^>RI, /tadM) H>J. 



PCR (Primer : P-MY003, P-MY004) 



cut (£coRl, tf/ndlll) 



J" 



J 



ligation (T4 DNA ligase) 




ligation (T4 DNA ligase) 



/fwHIt (210) 



RI (540) 





cut (Hfa dill, AhgQ cut (EcoRI, Jfihdffl) cut (ScoR t, Xbal) 



Win dill 0957) 



t 



ligation (T4 DNA ligase) 



EeoK 0) 



/fifidlU (MO) 
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P-MY036 -U- 




ligation (T4 DNA ligase) 
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&d (7) P-MY005 ACC ATG ATT TCG AGC TCG GTA CCC TCA GCA 




ligation (T4 DNA ligasc) 
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FIG- 11 



G AO activities [units/ml-culture] 



Plasmid (vector) Host strain DHSaMCR BL21(DE3) KY-14478 

Induction - IPTG - IPTG IPTG 
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FIG. 12 
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N-terminal 
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Plasmid 



GAO activity [units/ml] (+ IPTG) 
BL21(DE3) KY-14478 



pGAO-011 

(pUC18) 
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( P UC18) 
pGAO-010 

(pUCl8) 
pGAO-027 

(pUC18) 
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FIG. 14 
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o BL21(DE3)/pGAO-010 (wild type) 

• BL21(DE3)/pGAO-010M (random alternation) 
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FIG. 15 



Substrate (100 mM) 



Rearative activities of galactose oxidase [%] 
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Date : 2000.04.10 

Mutant ID : 9.16.BD2 

Mutation : N537D (A1609G) 

Sequence Size : 1917 



FIG. 17A 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPTGSAISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
A QS GNECNKAI DGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
T FY GANGDPK P PH-TYT I DMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFA DSTTKYSNFETRP 

370 380 . 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
A R Y V RLVAITEANG QPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
A E I N V F Q • A S S Y TA P Q P G L G R 

490 ' 500 510 " 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
W G PT IDLPIV PAAAAIE PTS 

550 560 570 . 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPG G 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
V TGG N DAKK T S LYDSS S DSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQVARGYQSSATMSD 



14/49 



WO 01/88110 



PCT/US00/32345 



FIG. 17B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
q RVFTIGGSW SGGVFEKNGE 

9X0 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTStPNAKVHPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

10 30 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTANNWYYTS 

10 90 1100 1110 U20 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

U50" 1160 1170 U80 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGN AVMYDAVKGKILTFG 

12 10 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHI ITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGT SPNTVFASNGLYF ART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
F H T S V * V L P D GST FITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFE DSTPVFTPEIY VPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTFYKQ NPM SIVRVYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LL PDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC GAC GGC AAT CTC 
NHFDAQIFT PNY, LYNSDGNL 

1630 1640 1650 1660 1670 16B0 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRP KITRTSTQSVKVGGRI 

1690 1700 1710 1720 1*>30 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
• TIS TDSSI SKASLIR*GTAT 



15/49 



WO 01/88110 



PCT/US00/32345 



1"?50 1760 1*770 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT 
HTVN TDQRRI 

1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQVPSDS 

1870 18 00 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



H80 1*790 . 1800 

CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
PLTLTNNGGN 

1840 1850 I860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTI RV TQ 



FIG. 17C 
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Date : 2000.04.10 FTP 1 R A 

Mutant ID : 9.16.6C11 riU ' XOA 

Mutation : V494A {T1481C) , C515S (T1543A) 

Sequence Size : 1917 



10 - 20 30 40 50 60 

GCC TCA GCA CCT ATC <?GA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPI GSA ISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECN KAIDGNK DT FWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQN VNGLSMLPR QDG N QN G 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
W I G R H E V Y LSSDGTNW G SPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAITE ANGQPWTSI 

430 440 450 . 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSY TAPQP GLGR 

490 " 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGP TI DLPIVPAAAAI E PTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRND AFGGS PG G 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITL TSSWDPSTGIV SDRT VT 

670 680 ' 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTK HD MFCPGISMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD 
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FIG- 18B 



850 860 810 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
G R VFT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
V Y5 P S S K TV? T S LP N A K V N P M 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKG SVFQAGPS TAMNWY YTS 

N 1090 1100 1110 1120 1130 U40 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 H80 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMY DAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDAT T N A H I I T L G 

1270 1280 1290 13O0 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVL PDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG - GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
G I. pFEDSTPVFTPE lYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSIVRAYHSIS L 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT AGT GGC GAT TGT ACC ACG 
LIiPDG RVFNGGGGLSGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 1670 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATR P K I TR T S TQSV K V G G R 

1690 1700 1710 1720 1730 • 1140 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
XIS TDSSISKASLI R*GTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HT V^TDQRRIPLTLTNNGGN 
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1810 1820 1830 1840 1850 1860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
SYSFQV PSDSGVALPG YWML 

1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGVPSVASTIRVTQ 



FIG. 18C 
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Date : 2000.04.10 FIG. 19A 

Mutant ID : 9.16.16D12 

Mutation . : P136(T408C), V494A<T1481C) 

Sequence Size : 191*7 

10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

•70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAI DGNKDTFWH 

130 140 150 160 1*70 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC AC A TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGL'SM'LPRQDGNQNG 

250 260 270. 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFAD STTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARYVRLVAITEANGQPWTSI 

430 440 450 460 470 480 

gca gag atc aac gtc ttc caa gct agt tct tac aca gcc ccc cag cct ggt ctt gga cgc 
a einvfqa ssytapqpgi.gr 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG. 
WGPTI DLPIVPAAAAIEPTS 

'550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
G RV LMW S SYRN DA FGGS PG G 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
IT L.TSSWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFC.PGISM DGN GQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLY D SSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
! pG P DMQVARG Y Q S SA T MS D 
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FIG. 19B 



850 B60 810 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
Grv FT IGGSWSGG VF EKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

97 0 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTAD KQGLYRSDNHAWLFGW 



1030 



1040 1050 1060 10*70 1080 



AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
DVKSAGKRQSNRGVAPD 



S 

1150 



1160 1170 1180 1190 1200 

GCC ATG TGcTgGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
MCGNAVMYDAVKGKILTFG 



1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQ-DSDATTMAHXXTLG 

12T0 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
F HT S VVLPDGST F ITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG „ GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIP FE PSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQ NPNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLP DGRVFNGGGGLCGDCTT 



1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CA"G AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKIT RTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
T1STDSS1SKASLIRYGTAT 

1750 1760 1770 1780 1790 1800 

cac acg gtt aat act gac cag cgc cgc att ccc ctg act ctg aca aac aat gga gga aat 
ht.v ntdqrri p ltltnnggn 
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1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYS FQVPSDS 

18T0 1880 1890 

TTC GTG.-ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



- 1840 1850 I860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 
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Date : 2000.04.13 FIG. 20A 

Mutant ID : 11.03.6D3 

Mutation : S10P(T28C), P136(T408C), V494A (T1481C) 

Sequence Size : 1917 

10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT CCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAIPRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGNKDTF WH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TF YG ANGDPKP P .HTYTI DMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TT Q N VNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIG 'RHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC .CCT 
A SGSWF ADSTTK YSNFETRP 

370 380 390 400 410 420 

GOT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
AR Y V RLVA I T EANGQ PWT S I 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEIN VFQASS YTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRV LMWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
1 T i,TSSWD PSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 » 740 750 . 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAK.KTSL YDSS.SDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I pQPDMQVARGYQSSATMSD 
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FIG. 20B 

850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRV FTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 9B0 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC ACT 
KK GSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 HBO 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVK GKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNA HII.TLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTC CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG * GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPF E DSTPVF TPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTPY KQNPN SIVRA YHSISL> 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
Xj L ■ P D GRVFNGGGGL C GDCTT 

1570 15B0 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
N HFDAQlFTPNY bYNSNGNb 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPK ITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
T ! S T D .S S I SKASL1RYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
H TV N T D Q R * R I P LTL T N.N G G N 
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1810 1820 1830 1840 1850 i860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
SYSFQVPSDSGVALPGYWMli 

1B70 I860 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGV'PSVASTIRVTQ 



FIG. 20C 
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Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04.10 
11.03.10C3 

A3(A9C),P136(T408C) , G19SE (G584A) , V494A (T1481C) 
1917 



FIG. 21A 



10 20 30 40 50 60 

GCC TCA GCC CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC ACT 
ASAP 1GSAISRNNWAVTCDS 



110 



120 



70 80 90 100 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKA1DGNKDTFWH 



170 



180 



130 140 150 160 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TF YGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQ NVNGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVYLSSDGTNW GSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSN FETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARY VRLVAITEANGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEIN VFQASSYTAPQPGLGR 

490 • 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPT IDLP.IVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GAA GGA TCC CCT GGT GGT 
GRVLMWSSY R NDAFEGSPGG 

61 o 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTS SWDPSTGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQI V 



770 



780 



730 740 750 760 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAK K TSLYDSSS DSW 

7 90 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 

j PGP dmqvargyqssatmsd 
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FIG. 21B 



GGT 
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S 
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D 
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AAC CAC 
N H 


GCG 
A 


1010 
TGG CTC 
W L 


TTT 
F 


1020 
GGA TGG 
G W 


AAG 
K 


AAG 
K 


1030 
GGT TCG 
G S 


GTG 
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1040 
TTC CAA GCG 
F Q A 


1050 
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S 
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1060 
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1070 
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1080 
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1090 
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V 


1100 
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1110 
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G K 


CGC 
R 


CAG 
Q 


1120 
TCT AAC 
S N 


CGT 
R 


1130 1140 
GGT GTA GCC CCT GAT 
G V A P D 



1150 1160 1170 • 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCG NAVMY DAV KGK ILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDAT.TNAHIITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
E P GT S P NTV FAS N G LY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVV LPDGSTFI TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG * GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GI p FET5ST PVFT P E I YVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT. TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DT FYKQNPNSIVRAYHSISL 



1510 




1520 


1530 




1540 


1550 


1560 


CCT GAT 


GGC 


AGG GTA TTT 


AAC GGT GGT 


GGT 


GGT CTT 


TGT GGC GAT 


TGT ACC ACG 


P D 


G 


R V F 


N G G 


G 


G L 


C G D 


CTT 


1570 




1580 


1590 




1600 


1610 


1620 


TTC GAC 


GCG 
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TAT 
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AAT AGC AAC 


GGC AAT CTC 


F D 
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T P N 


Y 


- L Y 


N S N 
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1630 




1640 


1650 




1660 


1670 


1680 


CGT CCC 


AAG 
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ACC TCT ACA 


CAG 
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AAG GTC GGT 


GGC AGA ATT 
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K 


I T R 


T S T 


Q 


S V 


K V G 


G R I 


1690 




1700 


1710 




1720 


1730 


1740 


TCG ACG 


GAT 


TCT TCG ATT 


AGC AAG GCG 


TCG 


TTG ATT 


CGC TAT GGT 


ACA GCG ACA 


S T 


D 


SSI 


SKA 


S 


L I 


R Y G 


TAT 


1750 




1760 


1770 




1780 


1790 


1800 


GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG 


■ACA AAC AAT 


GGA GGA AAT 


V N 


T 


D Q R 


R I P 


L 


T L 


T N N 
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1810 1820 1830 1B40 1850 1860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
3VPS0SGVALPGYWML 



1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 

fvmnsagvpsvastirvtq 



FIG. 21C 
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FIG. 22A 

2000.04 .10 
11.03. 10D6 

P136(T408C), T218(T654C), L312 (A936G) , V494A (T14B1C) , N535D(A1603G) 
1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA. AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYG ANGDPKPP H TYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQN VNGLSMLPR-QDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVYLSSDGTNWGSP V 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
A S G S W F ADSTTKYSN FETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARYV RLV AITEANGQPWTSI. 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQA SSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
W GPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVL MWSSYRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT. TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACC GTG ACA 
ITLTSSWDPST GIVSDR T VT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VT G G N DAKKTS L Y DS S S DSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PG PDMQVARGYQSSATMSD 



Date 

Mutant ID 
Mutation 
Sequence Size 
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FIG- 22B 



850 860 870 880 890 400 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
G R V FT IGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTG CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LT ADKQGLYRSD NHAWLFGW 

1030 1040 1050 1060 1070 10B0 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
K KGSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAP D 

1150 H60 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
A MCGNA VMYDAVKGKILTFG 

1210 1220- 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHIITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHT SVVLPDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 

gipf e\pstpvftpeiyvpeq 

1450 1460 1470 1480 1490 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTF YKQNPNSIVRAYHSISL 

1510 1520 1530 1540 1550 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLP DGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 1610 *™ 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC GAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYDSNGNL 

1630 1640 1650 1660 1670 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 r "i? 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 

tist dssiskaslirygtat 

1750 1760 1770 1780 1790 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 

htvntoqrripltltnnggn 
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WO 01/88110 



PCT7US00/32345 



1810 1820 1830 



1840 1850 I860 



AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAG TGG ATG TTG 
SYSFQVPSDSGVALPGYWML 

1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT ACT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
F VMNSAGVPSVASTIRVTQ 
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Mutan ID 
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Sequence Size 



2000.04.10 
11.03. 13E12 

M70V(A208G) , P136<T408C), V494A (T1481C) 
1917 



FIG- 23A 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPZGSAISRNMHAVTCDS 

70 80 90 100 HO 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
T F Y GANG DP K P P H T Y T I DM K 

190 200 210 , 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT GTG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 

GLSVLPRQDGNQNG 



N 



250 260 2-70 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
EVYLSSDGTNWGSPV 



W 



H 



310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 



GCT CGC 
A R 



370 380 390 400 410 420 

TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
YVRLVAITEA NGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AE I NV..FQAS SYTAPQPGL .GR 

490 ' 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSYRNDAFGGSPGG 

. 610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWDPSTGI VSDRTVT 

6*70 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 

CPGI SMDGNGQIV 



H 



M 



730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSS SDS W 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQVARGYQSSATMSD 
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FIG. 23B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSG GVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

9"70 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPST AMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYD A VKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTNAHIITI/G 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTF1TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFED STPVFTPEIYVPEQ 

1450 1460 1470 - 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPN S IVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRV FN GGGGliCGDCTT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPK ITRTST QSVKVGGRI 

1690 1*700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSS ISKASLI RYGT AT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNT DQRRI PLTLTNNGGN 
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WO 01/88110 



PCT/US00/32345 



1810 1820 1830 1840 1850 I860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
SYSFQV PSDSGVALPGYWMIi 

1B70 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGVPSVASTIRVTQ 
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FIG. 24A 



Date 

Filename 
Mutation 

Sequence Size 



2000.04.10 
1.06.20E7 

S10P(T28C) , M70V (A208G) , P136{T408C) ,G195E (G584A) , V494A(T1481C) 

N535D<A1603G) 
1917 



10 20 30 40 SO 60' 

GCC TCA GCA CCT ATC GGA AGC GCC ATT CCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAIPRNNWAVTCDS 

70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSG'NECNKAIDGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT GTG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQ NVNGLSVLP RQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIG RHEVYLSSDGTNWGSPV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSW FADSTTKY SNFETRP 

3*70 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCC TGG ACT AGC ATT 
ARY VRLVAITEAN GQPWTSI 



430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
V FQASSYTA PQ PGLGR 



E 



490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GAA GGA TCC CCT GGT GGT 
GRVLMWSSYRND AFEGSPGG 

610 620 . 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTSSWD PS TGIVSDRTVT 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC. ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
V TGGNDAKKTSLY DSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PGPDMQVARGYQSSATMSD- 
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FIG. 24B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRV FTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
V Y S PSSKTWTSLPNAKVNP M 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDN HAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSV FQAGPSTAMNWYYTS 

1090 1100 1110 . 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKSAGKRQSNRGVAPD 

1150 1160 1170 11B0 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAV KG KI LT FG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC AAC GCC CAC ATC ATC ACC CTC GGT 
G S P DY Q DS.DATT N AH I I T LG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTS PNT .V FAS NGLY FART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FH TSVVLPDGST F I TGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
G I P FEDSTPVFT PEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQNPNSXVRA.YHSXSL* 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
L L P DGRV FN G GG G LCG DC TT 

1570 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC GAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYDSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDS SISKAS LIRYGTAT 

1750 1760 1770 1780 1790 1800 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRIPLTLTNN GGN 



36/49 



WO 01/88110 



PCT7US00/32345 



1810 1820 1830 1840 1850 I860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
S Y SFQVPSOSGVALPGYWML 



1870 1880 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMN SAGVPSVASTIRVTQ 



FIG. 24C 
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Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04 .11 
1.D4 
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FIG. 25A 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASA PIGSAISRNNWAVTCDS 

-70 80 90 100 110 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGN ECNKAIDGNKDTFW .H 

130 140 150 160 1*70 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TF YG ANGDPKPPHTYTIDMK 

190 200 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQ NVNGLSMLPR,QDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGR HEVYLSSDGTNWGS ( PV 

310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

310 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAITEANGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYT APQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPT IDLPIVPAAAAIEPTS 



590 



600 



GGA 
G 



550 560 570 580 

CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
R VL MWSSYRNDAFGGSPGG 



610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITLTS SWDPSTGIVSDRTV T 

670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 TIO 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
XPCPDM.QVARGYQSSATMSD 
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FIG. 25B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GR VFTIGGSWSGGVFBKNGE 



950 



960 



910 920 930 940 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 

K T W T S L P N A K V N P M 



S 



1010 



1020 



970 980 990 1000 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTAD KQCLYRSDNHAWLFG W 

1030 1040 1050 1060 10*70 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 UOO 1110 U20 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
G S G D V K S A G K R Q S N R G V A P D 

u50 U60 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILT FG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAH IITLG 

12 70 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

,330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVLPDGSTF1TGGQRR 



1430 



1440 



1390 > 1400 1410 1420 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIP FEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTF YKQNPNSIVRVYHSISL 

15 10 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
I L P D G R V F N G G G 6 L C G D C T T 

15 70 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQI 'FTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCT ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATR PKITRTSTQSVKVGGR1 



1730 



1740 



1690 1700 1710 1720 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
T ISTDSS I SKASLI R Y G T A T 



1790 



1800 



1750 1760 1770 1780 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
DQRRIPLTLTNNGGN 



H 



N 
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WO 01/88110 



PCT/US00/32345 



1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSFQV P-SDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 I860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTI RVTQ 



FIG. 25C 
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FIG. 26A 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPI .GSAISRNNWAVTCDS 

70 80 90 100 UO 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGN KDT FW H 



170 



180 



130 140 150 160 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPP HTYTIDMK 

190 200 210 220 230 . 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQN V .NGLSMLPRQDGNQNG 

250 260 270 280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 
WIGRHEVY LSSDGTNWGSPV 



310 



320 



330 



340 



350 



360 



GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTKYSNFETRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARY VRLVAITEANGQPWTSI 



470 



480 



430 • 440 450 460 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
A E INVFQASSYTAPQPGLGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLPIVPAAAAIEPTS 



590 



600 



550 560 570 580 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 

YRNDAFGGSPGG 



M 



W 



s 



610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
ITL TSSWDPSTGIVSDRTVT 

. 6 70 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDGNGQIV 

730 740 750 760 T70 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGNDAKKTSLYDSSSDSW 

790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
j pGPDMQVARGYQSSATMSD 
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FIG. 26B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
G R V FT IGGS W SGG V F E K N G E 

gl0 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
»y*pSSKTWTSLPNAKVNPM 



970 980 990 



1000 1010 1020 



TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LXAD KQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSVFQAGPSTAMNWYYTS 

1090 1100 1U0 U20 1130 1H0 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGDVKS AGKRQSNR GVAPD 

1150 1160 1170 1180 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 1260 

GGC TCC CCA GAT .TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAHIITLG 

12 70 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPGTSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
F HTSVVI*PDGS T F I T G G Q R * 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GTC TAC CAT AGC ATT TCC CTT 
DTFYKONPNSIVRVYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLP DGRVFNGGGGLCGD CTT 

1570 1580 1590 1600 1610 . 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQ.IFTPNYL YNSNGNL 

1630 1640 1650 1660 1670 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATR.PKXTRTSTQSVKVGGR* 

1690 1700 1710 1720 1730 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSI SKASLIRYGT.AT 

1750 1760 1770 1780 1790 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTVNTDQRRIPLTLTNNGGN 
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1810 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
SYSF QVPSDS 

1870 1880 1B90 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1840 1850 I860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VASTIRVTQ 



FIG. 26C 
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FIG. 27A 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04.11 
3. HI 

N413D(A1237G) , S550 (T16S0A) , V494A (T1481C) 
1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGC GCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
ASAPIGSAISRNNWAVTCDS 

70 80 90 100 HO 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGNKDTFWH 

130 140 150 160 1*70 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPPHTYTIDMK 

190 200* 210 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVNGLSML PRQDGNQNG 



290 



300 



250 260 270 280 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 

VYLSSDGTNWGS P v 



W 



H 



310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 
ASGSWFADSTTK YSNFBTRP 

370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
ARYVRLVAI TEANGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINV.FQASSYTAPQPG LGR 

490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPTIDLP IVPAAAAIEPTS- 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSSY RNDAFGGSPGG 



650 



660 



ATC ACT 
I T 



610 620 630 640 

TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 
LTSSWDPSTGI VSDRTVT 



710 



720 



670 680 690 700 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 
VTKHDMFCPGISMDG NGQIV 



770 



780 



730 740 750 760 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
VTGGND AKKTSIYDSSSDS W 

•790 800 810 820 "0 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
I PG P DM QVA R GKQSS AT MSB 
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FIG. 27B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
G R V F T IGGSWSGGV FEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWT SLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDN HAWLFGW 

10 30 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KKGSV FQAG PSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
G5GDVKSAGKRQSHRG. VAPD 



1150 



1160 1170 U80 1190 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
MCGNAVMYOAVKGKILTFG 



1210 



1220 1230 1240 1250 1260 

GGC TCC CCA* GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
* QDS DATTDAH I 1TLG 



1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EPG TSPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
PHTS.VVLPDGSTFITGGQRR* 

1390 . 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
GIPFEDSTPV .FTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFYKQN PNSIVRAYHSISL 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA' TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

15 70 1580 1590 1600 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 167.0 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
A T RPK I TRT S TQSV KVGGRI 

^1690 1T00 1710 1720 1730 1?40 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
TISTDSSISKASLIRYGTAT 

1750 "60 1770 1780 1790 1800 

cac acg gtt aat act gac cag cgc cgc att ccc ctg act ctg aca aac aat gga gga aat 
htvntdqrripltltnnggn 
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18.10 1820 1830 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCT 
S Y - S F Q V PSDS 

1870 1880 1890 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT 
FVMNSAGVPS 



1B40 1850 I860 

GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
GVALPGYWML 

1900 1910 1920 

GTG GCT TCG ACG ATT CGC GTT ACT CAG 
VAST! RVTQ 



FIG. 27C 
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FIG. 28A 



Date 

Mutant ID 
Mutation 
Sequence Size 



2000.04 .11 
4.F12 ' 

N413D(A1237G),S550(T1650A) , V494A(T1481C) , S610 (T1830A) 
1917 



10 20 30 40 50 60 

GCC TCA GCA CCT ATC GGA AGO CCC ATT TCT CGC AAC AAC TGG GCC GTC ACT TGC GAC AGT 
A SAPIGSAISRN NWAVTCD S 

70 80 90 100 UO 120 

GCA CAG TCG GGA AAT GAA TGC AAC AAG GCC ATT GAT GGC AAC AAG GAT ACC TTT TGG CAC 
AQSGNECNKAIDGNKDTFWH 

130 140 150 160 170 180 

ACA TTC TAT GGC GCC AAC GGG GAT CCA AAG CCC CCT CAC ACA TAC ACG ATT GAC ATG AAG 
TFYGANGDPKPP HTYTIDMK 



190 200 210 ' 220 230 240 

ACA ACT CAG AAC GTC AAC GGC TTG TCT ATG CTG CCT CGA CAG GAT GGT AAC CAA AAC GGC 
TTQNVN GLSML'PRQDGNQNG 

250 260 270 .280 290 300 

TGG ATC GGT CGC CAT GAG GTT TAT CTA AGC TCA GAT GGC ACA AAC TGG GGC AGC CCT GTT 

VYLSSDGTNWGSPV 



W 



H 



E 



310 320 330 340 350 360 

GCG TCA GGT AGT TGG TTC GCC GAC TCT ACT ACA AAA TAC TCC AAC TTT GAA ACT CGC CCT 

ADSTT KYSNFETRP 



S 



S 



w 



370 380 390 400 410 420 

GCT CGC TAT GTT CGT CTT GTC GCT ATC ACT GAA GCG AAT GGC CAG CCT TGG ACT AGC ATT 
p.RYVR LVAlTEA-NGQPWTSI 

430 440 450 460 470 480 

GCA GAG ATC AAC GTC TTC CAA GCT AGT TCT TAC ACA GCC CCC CAG CCT GGT CTT GGA CGC 
AEINVFQASSYTAPQPGLGR 



490 500 510 520 530 540 

TGG GGT CCG ACT ATT GAC TTA CCG ATT GTT CCT GCG GCT GCA GCA ATT GAA CCG ACA TCG 
WGPT IDLPIVPAAAAIEPTS 

550 560 570 580 590 600 

GGA CGA GTC CTT ATG TGG TCT TCA TAT CGC AAT GAT GCA TTT GGA GGA TCC CCT GGT GGT 
GRVLMWSS.YRNDAFGGSPGG 

610 620 630 640 650 660 

ATC ACT TTG ACG TCT TCC TGG GAT CCA TCC ACT GGT ATT GTT TCC GAC CGC ACT GTG ACA 

PSTGIVSDRTVT 



S 



W 



• 670 680 690 700 710 720 

GTC ACC AAG CAT GAT ATG TTC TGC CCT GGT ATC TCC ATG GAT GGT AAC GGT CAG ATC GTA 

C'PGISMDGNGQXV 



K 



M 



730 740 750 760 770 780 

GTC ACA GGT GGC AAC GAT GCC AAG AAG ACC AGT TTG TAT GAT TCA TCT AGC GAT AGC TGG 
V TGGN DAK KTS L Y DSSS DSW 



790 800 810 820 830 840 

ATC CCG GGA CCT GAC ATG CAA GTG GCT CGT GGG TAT CAG TCA TCA GCT ACC ATG TCA GAC 
IPGPDMQ-VARGYQSSATMSD 
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FIG- 28B 



850 860 870 880 890 900 

GGT CGT GTT TTT ACC ATT GGA GGC TCC TGG AGC GGT GGC GTA TTT GAG AAG AAT GGC GAA 
GRVFTIGGSWSGGVFEKNGE 

910 920 930 940 950 960 

GTC TAT AGC CCA TCT TCA AAG ACA TGG ACG TCC CTA CCC AAT GCC AAG GTC AAC CCA ATG 
VYSPSSKTWTSLPNAKVNPM 

970 980 990 1000 1010 1020 

TTG ACG GCT GAC AAG CAA GGA TTG TAC CGT TCA GAC AAC CAC GCG TGG CTC TTT GGA TGG 
LTADKQGLYRSDNHAWLFGW 

1030 1040 1050 1060 1070 1080 

AAG AAG GGT TCG GTG TTC CAA GCG GGA CCT AGC ACA GCC ATG AAC TGG TAC TAT ACC AGT 
KK GSVFQAGPSTAMNWYYTS 

1090 1100 1110 1120 1130 1140 

GGA AGT GGT GAT GTG AAG TCA GCC GGA AAA CGC CAG TCT AAC CGT GGT GTA GCC CCT GAT 
GSGD VKSAGKRQSNRGVAPD 

U50 1160 1170 1180 U90 1200 

GCC ATG TGC GGA AAC GCT GTC ATG TAC GAC GCC GTT AAA GGA AAG ATC CTG ACC TTT GGC 
AMCGNAVMYDAVKGKILTFG 

1210 1220 1230 1240 1250 ' 1260 

GGC TCC CCA GAT TAT CAA GAC TCT GAC GCC ACA ACC GAC GCC CAC ATC ATC ACC CTC GGT 
GSPDYQDSDATTDAHIITLG 

1270 1280 1290 1300 1310 1320 

GAA CCC GGA ACA TCT CCC AAC ACT GTC TTT GCT AGC AAT GGG TTG TAC TTT GCC CGA ACG 
EP GT SPNTVFASNGLYFART 

1330 1340 1350 1360 1370 1380 

TTT CAC ACC TCT GTT GTT CTT CCA GAC GGA AGC ACG TTT ATT ACA GGA GGC CAA CGA CGT 
FHTSVVL PDGSTFITGGQRR 

1390 1400 1410 1420 1430 1440 

GGA ATT CCG TTC GAG GAT TCA ACC CCG GTA TTT ACA CCT GAG ATC TAC GTC CCT GAA CAA 
G . IPFE*DSTPVFTPEIYVPEQ 

1450 1460 1470 1480 1490 1500 

GAC ACT TTC TAC AAG CAG AAC CCC AAC TCC ATT GTT CGC GCC TAC CAT AGC ATT TCC CTT 
DTFY KQN PN SI VRAYHS I S L 

1510 1520 1530 1540 1550 1560 

TTG TTA CCT GAT GGC AGG GTA TTT AAC GGT GGT GGT GGT CTT TGT GGC GAT TGT ACC ACG 
LLPDGRVFNGGGGLCGDCTT 

1570 1580 1590 1600 - 1610 1620 

AAT CAT TTC GAC GCG CAA ATC TTT ACG CCA AAC TAT CTT TAC AAT AGC AAC GGC AAT CTC 
NHFDAQIFTPNYLYNSNGNL 

1630 1640 1650 1660 1670 1680 

GCG ACA CGT CCC AAG ATT ACC AGA ACC TCA ACA CAG AGC GTC AAG GTC GGT GGC AGA ATT 
ATRPKITRTSTQSVKVGGRI 

1690 1700 1710 1720 1730 1740 

ACA ATC TCG ACG GAT TCT TCG ATT AGC AAG GCG TCG TTG ATT CGC TAT GGT ACA GCG ACA 
T1STDSSISKASLIRYGT AT 

1750 1760 1770 1780 1?90 

CAC ACG GTT AAT ACT GAC CAG CGC CGC ATT CCC CTG ACT CTG ACA AAC AAT GGA GGA AAT 
HTV NTDQRR I PLTLTNNGGN 
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1810 ' 1820 1830 1840 1850 1860 

AGC TAT TCT TTC CAA GTT CCT AGC GAC TCA GGT GTT GCT TTG CCT GGC TAC TGG ATG TTG 
SYSF QVP S DSGVA LPGYWML 

1870 1880 . 1890 1900 1910 1920 

TTC GTG ATG AAC TCG GCC GGT GTT CCT AGT GTG GCT TCG ACG ATT CGC GTT ACT CAG 
FVMNSAGVPSVASTIRVTQ 



FIG. 28C 
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